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Abstract 

Coronaviruses are large, enveloped RNA viruses of both medical and 
veterinary importance. Interest in this viral family has intensified in 
the past few years as a result of the identification of a newly emerged 
coronavirus as the causative agent of severe acute respiratory syn¬ 
drome (SARS). At the molecular level, coronaviruses employ a variety 
of unusual strategies to accomplish a complex program of gene expres¬ 
sion. Coronavirus replication entails ribosome frameshifting during 
genome translation, the synthesis of both genomic and multiple sub- 
genomic RNA species, and the assembly of progeny virions by a path¬ 
way that is unique among enveloped RNA viruses. Progress in the 
investigation of these processes has been enhanced by the development 
of reverse genetic systems, an advance that was heretofore obstructed 
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by the enormous size of the corona virus genome. This review sum¬ 
marizes both classical and contemporary discoveries in the study of the 
molecular biology of these infectious agents, with particular emphasis 
on the nature and recognition of viral receptors, viral RNA synthesis, 
and the molecular interactions governing virion assembly. 


I. Introduction 

Coronaviruses are a family of enveloped RNA viruses that are 
distributed widely among mammals and birds, causing principally 
respiratory or enteric diseases but in some cases neurologic illness or 
hepatitis (Lai and Holmes, 2001). Individual coronaviruses usually 
infect their hosts in a species-specific manner, and infections can be 
acute or persistent. Infections are transmitted mainly via respiratory 
and fecal-oral routes. The most distinctive feature of this viral family is 
genome size: coronaviruses have the largest genomes among all RNA 
viruses, including those RNA viruses with segmented genomes. This 
expansive coding capacity seems to both provide and necessitate a 
wealth of gene-expression strategies, most of which are incompletely 
understood. 

Two prior reviews with the same title as this one have appeared 
in the Advances in Virus Research series (Lai and Cavanagh, 1997; 
Sturman and Holmes, 1983). The earlier of the two noted that the 
recognition of coronaviruses as a separate virus family occurred in 
the 1960s, in the wake of the discovery of several new human respira¬ 
tory pathogens, certain of which, it was realized, appeared highly 
similar to the previously described avian infectious bronchitis virus 
(IBV) and mouse hepatitis virus (MHV) (Almeida and Tyrrell, 1967). 
These latter viruses had a characteristic morphology in negative-stained 
electron microscopy, marked by a “fringe” of surface structures described 
as “spikes” (Berry et al., 1964) or “club-like” projections (Becker et al., 
1967). Such structures were less densely distributed and differently 
shaped than those of the myxoviruses. To some, the fringe resembled 
the solar corona, giving rise to the name that was ultimately assigned to 
the group (Almeida et al., 1968). Almost four decades later, recognition of 
the same characteristic virion morphology alerted the world to the emer¬ 
gence of another new human respiratory pathogen: the coronavirus 
responsible for the devastating outbreak of severe acute respiratory 
syndrome (SARS) in 2002-2003 (Ksiazek et al., 2003; Peiris et al., 
2003). The sudden appearance of SARS has stimulated a burst of new 
research to understand the basic replication mechanisms of members of 
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this family of viral agents, as a means toward their control and prophy¬ 
laxis. Thus, the time is right to again assess the state of our collective 
knowledge about the molecular biology of coronaviruses. 

Owing to limitations imposed by both space and the expertise of the 
author, “molecular biology” will be considered here in the more narrow 
sense, that is, the molecular details of the cellular replication of coro¬ 
naviruses. No attempt will be made to address matters of pathogenesis, 
viral immunology, or epidemiology. For greater depth and differences of 
emphasis in particular areas, as well as for historical perspectives, the 
reader is referred to the two excellent predecessors of this review (Lai 
and Cavanagh, 1997; Sturman and Holmes, 1983) and also to volumes 
edited by Siddell (1995) and Enjuanes (2005). 


II. Taxonomy 

Coronaviruses are currently classified as one of the two genera in 
the family Coronaviridae (Enjuanes et al., 2000b). However, it is likely 
that the coronaviruses, as well as the other genus within the 
Coronaviridae, the toroviruses (Snijder and Horzinek, 1993), will each be 
accorded the taxonomic status of family in the near future (Gonzalez et al., 
2003). Therefore, throughout this review, the coronaviruses are referred 
to as a family. Both the coronaviruses and the toroviruses, in addition to 
two other families, the Arteriviridae (Snijder and Meulenberg, 1998) 
and the Roniviridae (Cowley et al., 2000; Dhar et al., 2004), have been 
grouped together in the order Nidovirales. This higher level of organi¬ 
zation recognizes a relatedness among these families that sets them 
apart from other nonsegmented positive-strand RNA viruses. The most 
salient features that all nidoviruses have in common are: gene expres¬ 
sion through transcription of a set of multiple 3'-nested subgenomic 
RNAs; expression of the replicase polyprotein via ribosomal ffameshift- 
ing; unique enzymatic activities among the replicase protein products; a 
virion membrane envelope; and a multispanning integral membrane 
protein in the virion. The first of these qualities provides the name 
for the order, which derives from the Latin nido for nest (Enjuanes 
et al., 2000a). In contrast to their commonalities, however, nidovirus 
families differ from one another in distinct ways, most conspicuously 
in the numbers, types, and sizes of the structural proteins in their 
virions and in the morphologies of their nucleocapsids. A more detailed 
comparison of characteristics of these virus families has been given 
by Enjuanes et al. (2000b) and Lai and Cavanagh (1997). 
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Members of the coronavirus family have been sorted into three 
groups (Table I), which, it has been proposed, are sufficiently divergent 
to merit the taxonomic status of genera (Gonzalez et al ., 2003). Classi¬ 
fication into groups was originally based on antigenic relationships. 
However, such a criterion reflects the properties of a limited subset of 
viral proteins, and cases have arisen where clearly related viruses in 
group 1 were found not to be serologically cross-reactive (Sanchez 
et al., 1990). Consequently, sequence comparisons of entire viral gen¬ 
omes (or of as much genomic sequence as is available) have come to be 
the basis for group classification (Gorbalenya et al., 2004). Almost all 
group 1 and group 2 viruses have mammalian hosts, with human 
coronaviruses falling into each of these groups. Viruses of group 3, by 
contrast, have been isolated solely from avian hosts. Most of the cor¬ 
onaviruses in Table I have been studied for decades, and, by the turn of 
the century, the scope of the family seemed to be fairly well-defined. 
Accordingly, it came as quite a shock, in 2003, when the causative 
agent of SARS was found to be a coronavirus (SARS-CoV). Equally 
astonishing have been the outcomes of renewed efforts, following the 
SARS epidemic, to detect previously unknown viruses; these investi¬ 
gations have led to the discovery of two more human respiratory 
coronaviruses, HCoV-NL63 (van der Hoek et al., 2004) and HCoV- 
HKU1 (Woo et al., 2005). Three distinct bat coronaviruses have also 
been isolated: two are members of group 1, and the third, in group 2, is 
a likely precursor of the human SARS-CoV (Lau et al., 2005; Li et al., 
2005c; Poon et al., 2005). In addition, new IBV-like viruses have been 
found that infect geese, pigeons, and ducks ( Jonassen et al., 2005). 

In almost all cases, the assignment of a coronavirus species to a given 
group has been unequivocal. Exceptionally, the classification of SARS- 
CoV has provoked considerable controversy. The original, unrooted, 
phylogenetic characterizations of the SARS-CoV genome sequence pos¬ 
ited this virus to be roughly equidistant from each of the three previ¬ 
ously established groups. It was thus proposed to be the first recognized 
member of a fourth group of coronaviruses (Marra et al., 2003; Rota 
et al., 2003). However, a subsequently constructed phylogeny based on 
gene lb, which contains the viral RNA-dependent RNA polymerase and 
which was rooted in the toroviruses as an outgroup, concluded that 
SARS-CoV is most closely related to the group 2 coronaviruses (Snijder 
et al., 2003). In the same vein, it was noted that regions of gene la of 
SARS-CoV contain domains that are unique to the group 2 corona¬ 
viruses (Gorbalenya et al., 2004). Other analyses of a subset of structur¬ 
al gene sequences (Eickmann et al., 2003) and of RNA secondary 
structures in the 3' untranslated region (3' UTR) of the genome (Goebel 
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TABLE I 

Coronavihus Species and Groups 


Group Designation 

Species 

Host 

GenBank 

accession 

number* 


1 TGEV 

Transmissible 
gastroenteritis virus 

Pig 

AJ271965 

[gl 

PRCoV 

Porcine respiratory 
coronavirus 

Pig 

Z24675 

[p] 

FIPV 

Feline infectious 
peritonitis virus 

Cat 

AY994055 

[gl 

FCoV 

Feline enteric 
coronavirus 

Cat 

Y13921 

[p] 

CCoV 

Canine coronavirus 

Dog 

D13096 

[p] 

HCoV-229E 

Human coronavirus 
strain 229E 

Human 

AF304460 

[gl 

PEDV 

Porcine epidemic 
diarrhea virus 

Pig 

AF353511 

[g] 

HCoV-NL63 

Human coronavirus 
strain NL63 

Human 

AY567487 

[g] 

Bat-CoV-61 

Bat coronavirus 
strain 61 

Bat 

AY864196 

[p] 

Bat-CoV-HKU2 

Bat coronavirus strain 
HKU2 

Bat 

AY594268 

[p] 

2 MHV 

Mouse hepatitis virus 

Mouse 

AY700211 

[g] 

BCoV 

Bovine coronavirus 

Cow 

U00735 

[g] 

RCoV 

Rat coronavirus 

Rat 

AF088984 

[p] 

SDAV 

Sialodacryoadenitis 

Rat 

AF207551 

[p] 

HCoV-OC43 

Human coronavirus 
strain OC43 

Human 

AY903460 

[g] 

HEV 

Hemagglutinating 

encephalomyelitis 

Pig 

AF481863 

[p] 

PCoV t 

Puffinosis coronavirus 

Puffin 

AJ544718 

[p] 

ECoV 

Equine coronavirus 

Horse 

AY316300 

[p] 

CRCoV 

Canine respiratory 
coronavirus 

Dog 

CQ772298 

[p] 

SARS-CoV 

SARS coronavirus 

Human 

AY278741 

[g] 

HCoV-HKUl 

Human coronavirus 
strain HKU1 

Human 

AY597011 

[g] 

Bat-SARS-CoV 

Bat SARS coronavirus 

Bat 

DQ022305 

[g] 


( continues ) 
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TABLE I ( continued) 


Group Designation 


Host 


GenBank 

number* * 


Infectious bronchitis 


Chicken AJ311317 


[g] 


TCoV 

PhCoV 

GCoV 

PCoV + 

DCoV 


Turkey coronavirus Turkey 

Pheasant coronavirus Pheasant 

Goose coronavirus Goose 

Pigeon coronavirus Pigeon 

Duck coronavirus Mallard 


AY342357 

AJ618988 

AJ871017 

AJ871022 

AJ871024 


[p] 

[p] 

[p] 

[p] 

[p] 


One representative GenBank accession number is given for each species. When 
available, a complete genomic sequence (denoted [g]) is provided; otherwise, the largest 
available partial sequence (denoted [p]) is given. 

* Unique designations have not yet been formulated for these two viruses. 


et al., 2004b) also supported a group 2 assignment. By contrast, some 
authors have argued, based on bioinformatics methods, that the ancestor 
of SARS-CoV was derived from multiple recombination events among 
progenitors from all three groups (Rest and Mindell, 2003; Stanhope 
et al., 2004; Stavrinides and Guttman, 2004). While these latter studies 
assume that historically there has been limitless opportunity for inter- 
group recombination, there is no well-documented example of recombi¬ 
nation between extant coronaviruses of different groups. Moreover, it is 
not clear that intergroup recombination is even possible, owing to repli¬ 
cative incompatibilities among the three coronavirus groups (Goebel 
et al., 2004b). Therefore, although SARS-CoV does indeed have unique 
features, the currently available evidence best supports the conclusion 
that it is more closely allied with the group 2 coronaviruses and that it 
has not sufficiently diverged to constitute a fourth group (Gorbalenya 
etal., 2004). 


III. Virion Morphology, Structural Proteins, and Accessory Proteins 
A. Virus and Nucleocapsid 

Coronaviruses are roughly spherical and moderately pleiomorphic 
(Fig. 1). Virions have typically been reported to have average dia¬ 
meters of 80-120 nm, but extreme sizes as small as 50 nm and as large 
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proteins. 

as 200 nm are occasionally given in the older literature (Oshiro, 1973; 
McIntosh, 1974). The surface spikes or peplomers of these viruses, vari¬ 
ously described as club-like, pear-shaped, or petal-shaped, project some 
17—20 nm from the virion surface (McIntosh, 1974), having a thin base 
that swells to a width of about 10 nm at the distal extremity (Sugiyama 
and Amano, 1981). For some coronaviruses a second set of projections, 
5—10-nm long, forms an undergrowth beneath the major spikes (Guy 
et al ., 2000; Patel et al., 1982; Sugiyama and Amano, 1981). These shorter 
structures are now known to be the hemagglutinin-esterase (HE) protein 
that is found in a subset of group 2 coronaviruses (Section IH.G). 

At least some of the heterogeneity in coronavirus particle morphology 
can be attributed to the distorting effects of negative-staining proce¬ 
dures. Freeze-dried (Roseto et al., 1982) and cryo-electron microscopic 
(Risco et al., 1996) preparations of BCoV and TGEV, respectively, 
showed much more homogeneous populations of virions, with diameters 
10-30 nm greater than virions in comparable samples prepared by 
negative staining. Extraordinary three-dimensional images have been 
obtained for SARS-CoV virions emerging from infected Vero cells (Ng 
et al., 2004). These scanning electron micrographs and atomic force 
micrographs reveal knobby, rosette-like viral particles resembling tiny 
cauliflowers. It will be exciting to see future applications of advanced 
imaging techniques to the study of coronavirus structure. 

The internal component of the coronavirus virion is obscure in elec¬ 
tron micrographs of whole virions. In negative-stained images the 
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core appears as an indistinct mass with a densely staining center, 
giving the virion a “punched-in” spherical appearance. Imaging of 
virions that have burst spontaneously, expelling their contents, or 
that have been treated with nonionic detergents has allowed visuali¬ 
zation of the coronavirus core. Such analyses led to the attribution 
of another distinguishing characteristic to the coronavirus family: that 
its members possess helically symmetric nucleocapsids. Such nucleo- 
capsid symmetry is the rule for negative-strand RNA viruses, but 
almost all positive-strand RNA animal viruses have icosahedral ribo- 
nucleoprotein capsids. However, although it is fairly well accepted that 
coronaviruses have helical nucleocapsids, there are surprisingly few 
published data that bear on this issue. Additionally, the reported results 
vary considerably with both the viral species and the method of prepara¬ 
tion. The earliest study of nucleocapsids from spontaneously disrupted 
HCoV-229E virions found tangled, threadlike structures 8-9 nm in diam¬ 
eter; these were unraveled or clustered to various degrees and, in rare 
cases, retained some of the shape of the parent virion (Kennedy and 
Johnson-Lussenburg, 1975/76). A subsequent analysis of spontaneously 
disrupted virions of HCoV-229E and MHV observed more clearly helical 
nucleocapsids, with diameters of 14-16 nm and hollow cores of 3—4 nm 
(Macnaughton et al., 1978). The most highly resolved images of any 
coronavirus nucleocapsid were obtained with NP-40-disrupted HCoV- 
229E virions (Caul et al ., 1979). These preparations showed filamentous 
structures 9-11 or 11-13 nm in diameter, depending on the method of 
staining, with a 3-4-nm central canal. The coronavirus nucleocapsid was 
noted to be thinner in cross-section than those of paramyxoviruses and 
also to lack the sharply segmented “herringbone” appearance character¬ 
istic of paramyxovirus nucleocapsids. By contrast, in early studies, IBV 
and TGEV nucleocapsids were refractory to the techniques that had been 
successful with other viruses. Visualization of IBV nucleocapsids, which 
seemed to be very sensitive to degradation (Macnaughton et al., 1978), 
was finally achieved by electron microscopy of viral samples prepared by 
carbon-platinum shadowing (Davies et al., 1981). This revealed linear 
strands, some as long as 6-7 pm, which were only 1.5-nm thick, suggest¬ 
ing that they represented unwound helices. TGEV, on the other hand, 
was found to be more resistant to nonionic detergents. Treatment of 
virions of this species with NP-40 resulted in spherical subviral particles 
with no threadlike substructure visible (Garwes et al., 1976). The TGEV 
core was later seen as a spherically symmetric, possibly icosahedral, 
superstructure that only dissociated further into a helical nucleocapsid 
following Triton X-100 treatment of virions (Risco et al., 1996). Such a 
collection of incomplete and often discrepant results makes it clear that 
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much further examination of the internal structure of coronavirus vir¬ 
ions is warranted. It would substantially aid our understanding of coro¬ 
navirus structure and assembly if we had available a detailed description 
of nucleocapsid shape, length, diameter, helical repeat distance, and 
protein :RNA stoichiometry. 


B. Spike Protein (S) 

There are three protein components of the viral envelope (Fig. 1). 
The most prominent of these is the S glycoprotein (formerly called E2) 
(Cavanagh, 1995), which mediates receptor attachment and viral and 
host cell membrane fusion (Collins et al ., 1982). The S protein is a very 
large, N-exo, C-endo transmembrane protein that assembles into tri- 
mers (Delmas and Laude, 1990; Song et al., 2004) to form the distinc¬ 
tive surface spikes of coronaviruses (Fig. 2). S protein is inserted into 
the endoplasmic reticulum (ER) via a cleaved, amino-terminal signal 
peptide (Cavanagh et al ., 1986b). The ectodomain makes up most of the 
molecule, with only a small carboxy-terminal segment (of 71 or fewer of 
the total 1162—1452 residues) constituting the transmembrane domain 
and endodomain. Monomers of S protein, prior to glycosylation, 
are 128-160 kDa, but molecular masses of the glycosylated forms of 



amino-terminal SI and the carboxy-terminal S2 portions of the molecule. The arrow¬ 
head marks the site of cleavage for those S proteins that become cleaved by cellular 
protease(s). The signal peptide and regions of mapped receptor-binding domains (RBDs) 
are shown in SI. The heptad repeat regions (HR1 and HR2), putative fusion peptide (F), 
transmembrane domain, and endodomain are indicated in S2. At the left is a model for 
the S protein trimer. 
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full-length monomers fall in the range of 150-200 kDa. The S molecule 
is thus highly glycosylated, and this modification is exclusively 
N-linked (Holmes et al., 1981; Rottier et al., 1981). S protein ecto- 
domains have from 19 to 39 potential consensus glycosylation sites, 
but a comprehensive mapping of actual glycosylation has not yet been 
reported for any coronavirus. A mass spectrometric analysis of the 
SARS-CoV S protein has shown that at least 12 of the 23 candidate 
sites are glycosylated in this molecule (Krokhin et al ., 2003). For the 
TGEV S protein, it has been demonstrated that the early steps of 
glycosylation occur cotranslationally, but that terminal glycosylation 
is preceded by trimerization, which can be rate-limiting in S protein 
maturation (Delmas and Laude, 1990). In addition, glycosylation of 
TGEV S may assist monomer folding, given that tunicamycin inhibi¬ 
tion of high-mannose transfer was found to also block trimerization. 

The S protein ectodomain has between 30 and 50 cysteine residues, 
and within each coronavirus group the positions of cysteines are well 
conserved (Abraham et al., 1990; Eickmann et al ., 2003). However, as 
with glycosylation, a comprehensive mapping of disulfide linkages has 
not yet been achieved for any coronavirus S protein. 

In most group 2 and all group 3 coronaviruses, the S protein is 
cleaved by a trypsin-like host protease into two polypeptides, SI and 
S2, of roughly equal sizes. Even for uncleaved S proteins, that is, those 
of the group 1 coronaviruses and SARS-CoV, the designations SI and 
S2 are used for the amino-terminal and carboxy-terminal halves of the 
S protein, respectively. Peptide sequencing has shown that cleavage 
occurs following the last residue in a highly basic motif: RRFRR in 
LBV S protein (Cavanagh et al., 1986b), RRAHR in MHV strain A59 
S protein (Luytjes et al., 1987), and KRRSRR in BCoV S protein 
(Abraham et al ., 1990). Similar cleavage sites are predicted from the 
sequences of other group 2 S proteins, except that of SARS-CoV. It has 
been noted that the S protein of MHV strain JHM has a cleavage motif 
(RRARR) more basic than that found in MHV strain A59 (RRAHR). An 
expression study has shown that this difference accounts for the al¬ 
most total extent of cleavage of the JHM S protein that is seen in cell 
fines in which the A59 S protein undergoes only partial cleavage (Bos 
et al., 1995). 

The SI domain is the most divergent region of the molecule, both 
across and within the three coronavirus groups. Even among strains 
and isolates of a single coronavirus species, the sequence of SI can 
vary extensively (Gallagher et al., 1990; Parker et al., 1989; Wang 
et al., 1994). By contrast, the most conserved part of the molecule 
across the three coronavirus groups is a region that encompasses the 
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S2 portion of the ectodomain, plus the start of the transmembrane 
domain (de Groot et al., 1987). An early model for the coronavirus 
spike, which has held up well in light of subsequent work, proposed 
that the SI domains of the S protein oligomer constitute the bulb 
portion of the spike. The stalk portion of the spike, on the other hand, 
was envisioned to be a coiled-coil structure, analogous to that in influ¬ 
enza HA protein, formed by association of heptad repeat regions of the 
S2 domains of monomers (de Groot et al ., 1987). The roles of these two 
regions of the S protein in the initiation of infection will be discussed 
(Section IV.A). 


C. Membrane Protein (M) 

The M glycoprotein (formerly called El) is the most abundant con¬ 
stituent of coronaviruses (Sturman, 1977; Sturman et al., 1980) and 
gives the virion envelope its shape. The preglycosylated M polypeptide 
ranges in size from 25 to 30 kDa (221-262 amino acids), but multiple 
higher-molecular-mass glycosylated forms are often observed by SDS- 
PAGE (Krijnse Locker et al ., 1992a). The M protein of MHV has also 
been noted to multimerize under standard conditions of SDS-PAGE 
(Sturman, 1977). 

M is a multispanning membrane protein with a small, amino- 
terminal domain located on the exterior of the virion, or, intracellularly, 
in the lumen of the ER (Fig. 3). The ectodomain is followed by three 



Fig 3. The membrane (M), envelope (E), and nucleocapsid (N) proteins. At the right 
are linear maps of the proteins, denoting known regions of importance, including trans¬ 
membrane (tm) domains. At the left are models for the three proteins. 
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transmembrane segments and then a large carboxy terminus compris¬ 
ing the major part of the molecule. This latter domain is situated in the 
interior of the virion or on the cytoplasmic face of intracellular mem¬ 
branes (Rottier, 1995). M proteins within each corona virus group are 
moderately well conserved, but they are quite divergent across the 
three groups. The region of M protein showing the most conservation 
among all coronaviruses is a segment of some 25 residues encompass¬ 
ing the end of the third transmembrane domain and the start of the 
endodomain; a portion of this segment even retains homology to its 
torovirus counterpart (den Boon et al., 1991). The ectodomain, which is 
the least conserved part of the M molecule, is glycosylated. For most 
group 2 coronaviruses, glycosylation is O-linked, although two excep¬ 
tions to this pattern are MHV strain 2 (Yamada et al. , 2000) and SARS- 
CoV (Nal et al. , 2005), both of which have M proteins with N-linked 
carbohydrate. Group 1 and group 3 coronavirus M proteins, by con¬ 
trast, exhibit N-linked glycosylation exclusively (Cavanagh and Davis, 
1988; Garwes et al. , 1984; Jacobs et al., 1986; Stem and Sefton, 1982). 
At the time of its discovery in the MHV M protein, O-linked glycosyla¬ 
tion had not previously been seen to occur in a viral protein (Holmes 
et al., 1981), and MHV M has since been used as a model to study the 
sites and mechanism of this type of posttranslational modification (de 
Haan et al. , 1998b; Krijnse Locker et al., 1992a; Niemann et al. , 1982). 
Although the roles of M protein glycosylation are not fully understood, 
the glycosylation status of M can influence both organ tropism in vivo 
and the capacity of some coronaviruses to induce alpha interferon 
in vitro (Charley and Laude, 1988; de Haan et al., 2003a; Laude 
et al., 1992). 

The coronavirus M protein was the first polytopic viral membrane 
protein to be described (Armstrong et al., 1984; Rottier et al. , 1984), 
and the atypical topology of the MHV and LBV M proteins was exam¬ 
ined in considerable depth in cell-free translation and cellular expres¬ 
sion studies. For both of these M proteins, the entire ectodomain was 
found to be protease sensitive. However, at the other end of the mole¬ 
cule, no more than 20—25 amino acids could be removed from the 
carboxy terminus by protease treatment (Cavanagh et al., 1986a; 
Mayer et al., 1988; Rottier et al., 1984, 1986). This pattern suggested 
that almost all of the endodomain of M is tightly associated with the 
surface of the membrane or that it has an unusually compact structure 
that is refractory to proteolysis (Rottier, 1995). Most M proteins do not 
possess a cleaved amino-terminal signal peptide (Cavanagh et al., 
1986b; Rottier et al., 1984), and for both IBV and MHV it was demon¬ 
strated that either the first or the third transmembrane domain alone 
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is sufficient to function as the signal for insertion and anchoring of the 
protein in its native orientation in the membrane (Krijnse Locker 
et al., 1992b; Machamer and Rose, 1987; Mayer et al., 1988). The 
M proteins of a subset of group 1 coronaviruses (TGEV, FIPV, and 
CCoV) each contain a cleavable amino-terminal signal sequence 
(Laude et al., 1987), although this element may not be required for 
membrane insertion (Kapke et al., 1988; Vennema et al., 1991). Anoth¬ 
er anomalous feature of at least one group 1 coronavirus, TGEV, is that 
roughly one-third of its M protein assumes a topology in which part of 
the endodomain constitutes a fourth transmembrane segment, thereby 
positioning the carboxy terminus of the molecule on the exterior of the 
virion (Risco et al. , 1995). This alternative configuration of M has yet to 
be demonstrated for other coronavirus family members. 

D. Envelope Protein (E) 

The E protein (formerly called sM) is a small polypeptide, ranging 
from 8.4 to 12 kDa (76-109 amino acids), that is only a minor constitu¬ 
ent of virions (Fig. 3). Owing to its tiny size and limited quantity, E was 
recognized as a virion component much later than were the other 
structural proteins, first in IBV (Liu and Inglis, 1991) and then in 
TGEV (Godet et al., 1992) and MHV (Yu et al., 1994). Its significance 
was also obscured by the fact that in some coronaviruses, the coding 
region for E protein occurs as the furthest-downstream open reading 
frame (ORF) in a bi- or tricistronic mRNA and must therefore be 
expressed by a nonstandard translational mechanism (Boursnell 
et al., 1985; Budzilowicz and Weiss, 1987; Leibowitz et al., 1988; Liu 
et al., 1991; Skinner et al., 1985; Thiel and Siddell, 1994). E protein 
sequences are extremely divergent across the three coronavirus groups 
and in some cases, among members of a single group. Nevertheless, 
the same general architecture can be discerned in all E proteins: a 
short hydrophilic amino terminus (8-12 residues), followed by a large 
hydrophobic region (21—29 residues) containing two to four cysteines, 
and a then hydrophilic carboxy-terminal tail (39-76 residues), the 
latter constituting most of the molecule. 

E is an integral membrane protein, as has been shown for both the 
MHV and IBV E proteins by the criterion of resistance to alkaline 
extraction (Corse and Machamer, 2000; Vennema et al., 1996), and 
membrane insertion occurs without cleavage of a signal sequence 
(Raamsman et al., 2000). The E protein of IBV has been shown to be 
palmitoylated on one or both of its two cysteine residues (Corse 
and Machamer, 2002), but it is not currently clear whether this 
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modification is a general characteristic. One study of MHV E showed a 
gel mobility shift of E caused by hydroxylamine treatment, which 
cleaves thioester linkages (Yu et al ., 1994), but attempts to incorporate 
labeled palmitic acid into either the TGEV or MHV E protein have 
been unsuccessful (Godet et al., 1992; Raamsman et al., 2000). The 
topology of E in the membrane is at least partially resolved. Although 
one early report suggested a C-exo, N-endo membrane orientation for 
the TGEV E protein (Godet et al ., 1992), more extensive investigations 
of the MHV and IBV E proteins both concluded that the carboxy- 
terminal tail of the molecule is cytoplasmic (or, correspondingly, is 
situated in the interior of the virion) (Corse and Machamer, 2000; 
Raamsman et al ., 2000). Moreover, for IBV E, it was shown that the 
carboxy-terminal tail, in the absence of the membrane-bound domain, 
specifies targeting to the budding compartment (Corse and Machamer, 
2002). The status of the amino terminus is less clear, however. The IBV 
E protein amino terminus was inaccessible to antibodies at the cyto¬ 
plasmic face of the Golgi membrane, suggesting that this end of the 
molecule is situated in the lumen (corresponding to the exterior of 
the virion) (Corse and Machamer, 2000). Such a single transit, placing 
the termini of the protein on opposite faces of the membrane, would be 
consistent with prediction, by molecular dynamics simulations, that a 
broad set of E proteins occur as transmembrane oligomers (Torres 
et al ., 2005). Conflicting results were obtained with MHV E, though. 
Based on the cytoplasmic reactivity of an engineered amino-terminal 
epitope tag, it was proposed that the MHV E protein amino terminus is 
buried within the membrane near the cytoplasmic face (Maeda et al ., 
2001). This result also accords with the finding that no part of the 
MHV E protein in purified virions is accessible to protease treatment 
(Raamsman et al., 2000). Such an orientation would mean that the 
hydrophobic domain of E protein forms a hairpin, looping back through 
the membrane. This topology agrees with the outcome of a biophysical 
analysis of the SARS-CoV E protein transmembrane domain (Arbely 
et al., 2004). However, in the latter study it was asserted that the 
palindromic hairpin configuration of the transmembrane segment is 
unique to the SARS-CoV E protein, which begs the question of how the 
other coronavirus E proteins are situated in the membrane and why 
the E protein of SARS-CoV should differ. 

E. Nucleocapsid Protein (N) 

The N protein, which ranges from 43 to 50 kDa, is the protein com¬ 
ponent of the helical nucleocapsid and is thought to bind the genomic 
RNA in a beads-on-a-string fashion (Laude and Masters, 1995) (Fig. 3). 


THE MOLECULAR BIOLOGY OF CORONAVIRUSES 


207 


Based on a comparison of sequences of multiple strains, it has been 
proposed that the MHV N protein is divided into three conserved 
domains, which are separated by two highly variable spacer regions 
(Parker and Masters, 1990). Domains 1 and 2, which constitute most of 
the molecule, are rich in arginines and lysines, as is typical of many 
viral RNA-binding proteins. In contrast, the short, carboxy-terminal 
domain 3 has a net negative charge resulting from an excess of acidic 
over basic residues. While there is now considerable evidence to sup¬ 
port the notion that domain 3 truly constitutes a separate domain 
(Hurst et al., 2005; Koetzner et al., 1992), little is known about the 
structure of the other two putative domains. The overall features of the 
three-domain model appear to extend to N proteins of coronaviruses 
in groups 1 and 3, although the boundaries between domains appear 
to be less clearly defined for these latter N proteins. There is not a 
high degree of intergroup sequence homology among N proteins, with 
the exception of a strongly conserved stretch of 30 amino acids, 
near the junction of domains 1 and 2, which contains many aromatic 
hydrophobic residues (Laude and Masters, 1995). 

The main activity of N protein is to bind to the viral RNA. Unlike the 
helical nucleocapsids of nonsegmented negative-strand RNA viruses, 
coronavirus ribonucleoprotein complexes are quite sensitive to the 
action of ribonucleases (Macnaughton et dl., 1978). A significant por¬ 
tion of the stability of the nucleocapsid may derive from N-N monomer 
interactions (Narayanan et al., 2003b). Both sequence-specific and 
nonspecific modes of RNA binding by N have been assayed in vitro 
(Chen et al., 2005; Cologna et al., 2000; Masters, 1992; Molenkamp and 
Spaan, 1997; Nelson and Stohlman, 1993; Nelson et al., 2000; Robbins 
et al., 1986; Stohlman et al., 1988; Zhou et al., 1996). Specific RNA 
substrates that have been identified for N protein include the positive- 
sense transcription regulating sequence (Chen et al., 2005; Nelson 
et al., 2000; Stohlman et al., 1988), regions of the 3' UTR (Zhou et al., 
1996) and the N gene (Cologna et al., 2000), and the genomic RNA 
packaging signal (Cologna et al., 2000; Molenkamp and Spaan, 1997) 
(Section IV.C). The RNA-binding capability of the MHV N protein has 
been mapped to domain 2 of this molecule (Masters, 1992; Nelson and 
Stohlman, 1993). However, for IBV, two separate RNA-binding sites 
have been found to map, respectively, to amino- and carboxy-terminal 
fragments of N protein (Zhou and Collisson, 2000), and RNA-binding 
activity has been reported for a fragment of the SARS-CoV N protein 
containing parts of domains 1 and 2 (Huang et al., 2004b). 

N is a phosphoprotein, as has been shown for MHV, IBV, BCoV, TGEV, 
and SARS-CoV (Calvo et al., 2005; King and Brian, 1982; Lomniczi 
and Morser, 1981; Stohlman and Lai, 1979; Zakhartchouk et al., 
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2005). For MHV N, phosphorylation occurs exclusively on serine resi¬ 
dues (Siddell et al., 1981; Stohlman and Lai, 1979), but in IBV N a 
phosphothreonine residue was also found (Chen et al ., 2005). Kinetic 
analysis has shown that MHV N protein acquires phosphates rapidly 
following its synthesis (Siddell et al., 1981; Stohlman et al., 1983), and 
phosphorylation may lead to the association of N with intracellular 
membranes (Calvo et al., 2005; Stohlman et al., 1983). Although some 
15% of the amino acids of coronavirus N proteins are candidate phos- 
phoacceptor serines and threonines, phosphorylation appears to be 
targeted to a small subset of residues. For MHV, this was concluded 
both from the degree of charge heterogeneity of N protein observed in 
two-dimensional gel electrophoresis and from the limited number of 
tryptic phosphopeptides of N that could be separated by HPLC (Bond 
et al., 1979; Wilbur et al., 1986). Mass spectrometry has been employed 
to map the sites of phosphorylation of the IBV and TGEV N proteins. For 
IBV N, this was accomplished by comparison of unphosphorylated 
N protein expressed in bacteria with phosphorylated N protein ex¬ 
pressed in insect cells (Chen et al., 2005). Four sites of phosphorylation 
were found, two each in domains 2 and 3: Serl90, Serl92, Thr378, and 
Ser379. For TGEV N, purified virions and multiple fractions from in¬ 
fected cells were analyzed (Calvo et al., 2005). Here also, four sites of 
phosphorylation were found, one in domain 1 and three in domain 2: 
Ser9, Serl56, Ser254, and Ser256. In both of these analyses, the degree 
of sequence coverage achieved did not entirely rule out the possibility of 
additional, undetected phosphorylated residues in each of these 
N proteins. 

The role of N protein phosphorylation is currently unresolved, but 
this modification has long been speculated to have regulatory signifi¬ 
cance. In vitro binding evidence has been presented that phosphory¬ 
lated IBV N is better able to distinguish between viral and nonviral 
RNA substrates than is nonphosphorylated N (Chen et al., 2005). 
Possibly related to this result is the early conclusion, inferred from 
the differential accessibilities of some monoclonal antibodies, that 
phosphorylation induces a conformational change in the MHV N pro¬ 
tein (Stohlman et al., 1983). It has also been found that only a subset 
of the intracellular phosphorylated forms of BCoV N protein are 
incorporated into virions, suggesting that phosphorylation is linked 
to virion assembly and maturation (Hogue, 1995). The recent mapping 
of at least some of the N phosphorylation sites in some coronaviruses 
has now laid the groundwork for testing of the hypothetical functions 
of phosphorylation by reverse genetic methods. 
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A number of potential activities, other than its structural role in the 
virion, have been put forward for N protein. Based on the specific 
binding of N protein to the transcription-regulating sequence within 
the leader RNA, it has been proposed that N participates in viral 
transcription (Baric et al., 1988; Choi et al., 2002; Stohlman et al., 
1988). However, an engineered HCoV-229E replicon RNA that was 
devoid of the N gene and all other structural protein genes retained 
the capability to synthesize subgenomic RNA (Thiel et al., 2001b). 
Thus, if N protein does function in transcription, it must be in a 
modulatory, but not essential, capacity. Likewise, the binding of N 
protein to leader RNA has been implicated as a means for preferential 
translation of viral mRNAs (Tahara et al ., 1994, 1998), although data 
supporting this attractive hypothesis are, as yet, incomplete. N protein 
has also been found to enhance the efficiency of replication of replicon 
or genomic RNA in reverse genetic systems in which infections are 
initiated from engineered viral RNA (Almazan et al., 2004; Schelle 
et al., 2005; Thiel et al., 2001a; Yount et al., 2002). This may be indica¬ 
tive of a direct role of N in RNA replication, but it remains possible that 
the enhancement actually results from the sustained translation of a 
limiting replicase component. 

Finally, it was shown that, in addition to its presence in the cyto¬ 
plasm, IBV N protein localized to the nucleoli of about 10% of cells that 
were infected with IBV or were independently expressing N protein 
(Hiscox et al., 2001). This observation was extended to the N proteins 
of MHV and TGEV, suggesting that nucleolar localization is a general 
feature of all three coronavirus groups. Such localization was proposed 
to correlate with the arrest of cell division (Wurm et al., 2001). Addi¬ 
tionally, both MHV and IBV N proteins were found to bind to two 
nucleolar proteins, fibrillarin and nucleolin (Chen et al ., 2002). It must 
be noted, however, that nucleolar localization of N was not observed in 
TGEV-infected or SARS-CoV-infected cells by other groups of investi¬ 
gators (Calvo et al., 2005; Rowland et al., 2005). All steps of corona- 
virus replication are thought to occur outside of the nucleus. For MHV, 
it was shown some time ago that viral replication could occur in 
enucleated cells or in cells treated with actinomycin D or a-amanitin, 
host RNA polymerase inhibitors (Brayton et al., 1981; Wilhelmsen 
et al., 1981). By contrast, other studies reported that similar conditions 
reduced the growth yield of IBV, HCoV-229E, or FCoV (Evans and 
Simpson, 1980; Kennedy and Johnson-Lussenburg, 1979; Lewis 
et al., 1992). Even if coronavirus replication does not have an absolute 
dependence on the nucleus, the possibility remains that some viruses 
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can alter host nuclear functions so as to create an environment more 
favorable for viral infection. Such a modification might be brought 
about through the nuclear trafficking of one or more viral components. 


F. Genome 

The genomes of coronaviruses are nonsegmented, single-stranded 
RNA molecules of positive sense, that is, the same sense as mRNA 
(Fig. 4) (Lai and Stohlman, 1978; Lomniczi and Kennedy, 1977; 
Schochetman et al., 1977; Wege et al., 1978). Structurally they resem¬ 
ble most eukaryotic mRNAs, in having both 5' caps (Lai and Stohlman, 
1981) and 3' polyCA) tails (Lai and Stohlman, 1978; Lomniczi, 1977; 
Schochetman et al., 1977; Wege et al., 1978). Unlike most eukaryotic 
mRNAs, corona virus genomes are extremely large—nearly three times 
the size of alphavirus and flavivirus genomes and four times the size of 
picornavirus genomes. Indeed, at lengths ranging from 27.3 (HCoV- 
229E) to 31.3 kb (MHV), coronavirus genomes are among the largest 
mature RNA molecules known to biology. Again, unlike most eukary¬ 
otic mRNAs, coronavirus genomes contain multiple ORFs. The genes 
for the four canonical structural proteins discussed previously account 
for less than one-third of the coding capacity of the genome and are 
clustered at the 3' end. A single gene, which encodes the viral repli- 
case, occupies the 5'-most two-thirds of the genome. The invariant 
gene order in all members of the coronavirus family is 5'-replicase- 
S-E-M-N-3'. However, engineered rearrangement of the gene order of 
MHV was found to be completely tolerated by the virus (de Haan et al ., 
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Fig 4. Coronavirus genomic organization. The layout of the MHV genome is shown as 
an example. All coronavirus genomes have a 5' cap and 3' poly(A) tail. The invariant 
order of the canonical genes is replicase-S-E-M-N. The replicase contains two ORFs, la 
and lb, complete expression of which is accomplished via ribosomal frameshifling. 
Accessory proteins (2a, HE, 4, 5a, and I, in the case of MHV) occur at various positions 
among the canonical genes. 
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2002b). This implies that the native order, although it became fixed 
early in the evolution of the family, is not functionally essential. At the 
termini of the genome are a 5' UTR, ranging from 210 to 530 nucleo¬ 
tides, and a 3' UTR, ranging from 270 to 500 nucleotides. The noncod¬ 
ing regions between the ORFs are generally quite small; in some cases, 
there is a small overlap between adjacent ORFs. Additionally, one or 
a number of accessory genes are intercalated among the structural 
protein genes. 

In common with almost all other positive-sense RNA viruses, the 
genomic RNA of coronaviruses is infectious when transfected into 
permissive host cells, as was originally shown for TGEV (Norman 
et al., 1968), IBV (Lomniczi, 1977; Schochetman et al., 1977), and 
MHV (Wege et al., 1978). The genome has multiple functions during 
infection. It acts initially as an mRNA that is translated into the huge 
replicase polyprotein, the complete synthesis of which requires a ribo- 
somal frameshifting event (Section V.C.l). The replicase is the only 
translation product derived from the genome; all downstream ORFs 
are expressed from subgenomic RNAs. The genome next serves as the 
template for replication and transcription (Section V). Finally, the 
genome plays a role in assembly, as progeny genomes are incorporated 
into progeny virions (Section IV.C). 


G. Accessory Proteins 

Interspersed among the set of canonical genes, replicase, S, E, M, 
and N, all coronavirus genomes contain additional ORFs, in a wide 
range of configurations. As shown in Table II, these “extra” genes can 
fall in any of the genomic intervals among the canonical genes and can 
vary from as few as one (PEDV and HCoV-NL63) to as many as eight 
genes (SARS-CoV). In some cases, accessory genes can be entirely 
embedded in another ORF, as the internal (I) gene found within the 
N gene of many group 2 coronviruses (Fischer et al., 1997a; Lapps 
et al., 1987; Senanayake et al., 1992), or they can be extensively over¬ 
lapped with another gene, as the 3b gene of SARS-CoV. In addition, 
many accessory genes do not constitute the 5'-most ORF in the largest 
subgenomic RNA in which they appear, and they therefore must re¬ 
quire nonstandard translation mechanisms for their expression (Liu 
et al., 1991). Intracellular expression has been demonstrated for a 
number of accessory proteins, but for many others it is at present 
merely speculative. 

The coronavirus accessory genes were originally labeled nonstruc- 
tural, but this is not entirely apt, since the products of some of them, 
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TABLE II 

Coronavirus Accessory Proteins 


Group Virus species 


Accessory genes (Proteins)* 


1 


TGEV 

FIPV 


[rep] - [S] - 3a, 3b - [£] - [M] - [AH - 7 
[rep] - [S] - 3a, 3b, 3c - [E] - [M] - [)V] - 7a, 7b 
[rep] - [S] - 4a, 4b - [£] - [M] - [IV] 

[rep] - [S] - 3 - [E] - [M] - [JV] 

[rep] - [S] - 3 - [E] - [M] - [JV] 

[rep] - 2a, 2b(HE) - [S] - 4 - 5a, [E] - [M] - [N], 7b(I) 

[rep] - 2a - 2b(HE) - [S] - 4a(4.9k), 

4b(4.8k) - 5( 12.7k) [E] - [M] - [ N ], 7b(I) 

[rep] - 2a - 2b(HE) - [S] - 5( 12.9k) - [E] - [M] - [N], 7b(I) 
[rep] - [S] - 3a, 3b - [£] - [M] - 6 - 7a, 7b - 8a, 8b - [W], 9b(I) 
[rep] - 2(HE) - [S] - 4 - [E] - [M] - [N], 7ba) 

[rep] - [S] - 3 - [E] - [Af] - 6 - 7a, 7b - 8 - [N], 9b(I) 

[rep] - [S] - 3a, 3b, 3c - [£] - [M] - 5a, 5b - [AT] 


HCoV-229E 

PEDV 

HCoV-NL63 


2 


MHV 

BCoV 


HCoV-OC43 

SAKS-CoV 

HCoV-HKUl 


3 


Bat-SARS-CoV 

IBV 


Accessory genes and proteins are listed only for coronaviruses for which a complete 
genomic sequence is available. The protein product is indicated in parentheses in cases 
where it has a different designation than the gene. Products of separate transcripts are 
separated by hyphens; the transcription of accessory genes may vary among different 
strains of the same virus species (O’Connor and Brian, 1999). The canonical coronavirus 
genes are indicated in brackets; rep denotes replicase. 


the group 2 HE protein, the I protein (Fischer et al ., 1997a), and the 
SARS-CoV 3a protein, have been shown to be components of virions. 
Accessory genes were also previously called group-specific genes, but 
this appellation has become a misnomer in fight of the diversity re¬ 
vealed by recently discovered coronaviruses. In general, accessory 
genes are numbered according to the subgenomic RNA in whose 
unique region they appear, but this nomenclature system is sometimes 
overridden by historical precedent. As a result, identically numbered 
genes in two different viruses, for example, the 5a genes of MHV 
and IBV, do not necessarily occupy the same genomic position. Like¬ 
wise, two identically numbered genes, for example, the 3a genes of 
SARS-CoV and TGEV, do not necessarily have any sequence homology. 

It is often speculated that the coronavirus accessory genes were 
horizontally acquired from cellular or heterologous viral sources, but 
only in two cases, the group 2 HE and 2a genes, is there good evidence 
for this proposal. HE, the most clear-cut example, is discussed 
later. A possible function for the 2a protein has been inferred from a 
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bioinformatics analysis, which places it in a very large family of cellu¬ 
lar and viral 2',3'-cyclic phosphodiesterases (Mazumder et al., 2002). 
Besides its presence in some group 2 corona viruses, this gene also 
appears in another family within the Nidovirales order, the toro- 
viruses (Snijder et al., 1990). Curiously, in the toroviruses, the 2a 
homolog is situated as a module within the replicase polyprotein, 
suggesting either that it was acquired independently or that there 
was nonhomologous recombination between ancestors of viruses within 
the two families (Snijder et al., 1991). However, most accessory gene 
ORFs have no obvious homology to any other viral or cellular sequence 
in public databases. It is conceivable that many of them evolved in 
individual coronaviruses by the scavenging of ORFs from the virus’s 
own genome, through duplication and subsequent mutation, as has 
been proposed for several of the accessory proteins of SARS-CoV 
(Inberg and Linial, 2004). It is tempting to regard this as a possible 
origin for the SARS-CoV 3a protein, which has a topology and size 
remarkably similar to that of the M protein, although there is no 
sequence similarity between the two. Such a relationship would paral¬ 
lel that in the arteriviruses, another Nidovirales family, in which 
the major envelope glycoprotein is also a triple-spanning membrane 
protein and forms heterodimers with its M protein (Snijder and 
Meulenberg, 1998). 

It also needs to be considered that, although there is evidence that 
some accessory genes encode “luxury” functions for their respective 
viruses, other accessory genes may be genetic junk. Many isolates of 
IBV contain an extremely diverged segment of some 200 nucleotides 
between the N gene and the 3' UTR (Sapats et al., 1996). This was 
long considered to be a hypervariable region of the 3' UTR, although it 
was shown to be dispensable for RNA synthesis (Dalton et al., 2001). 
Intriguingly, coronavirus sequences closely related to IBV have been 
characterized in pigeons and geese. These sequences have one and 
two additional ORFs, respectively, between the N gene and the 
3' UTR (Jonassen et al., 2005). This finding suggests that the IBV 
hypervariable region and the PCoV ORF are degenerate remnants of 
a precursor retained in the GCoV sequence. The two GCoV ORFs, in 
turn, may be vestiges of one or more functional ancestral genes, or they 
may be derived from horizontally acquired sequences that there has 
been no selective pressure to eliminate. A similar situation probably 
pertains for the SARS-CoV 8a and 8b genes. Isolates of SARS-CoV 
from marketplace animals near the source of the epidemic were found 
to contain an additional 29 nucleotides absent from all but one previ¬ 
ously reported human isolate, and this apparent insertion resulted in 
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the fusion of ORFs 8a and 8b into a single ORF 8 (Guan et al., 2003). 
One scenario consistent with this observation is that loss of the 29-nt 
sequence was concomitant with the jump of the virus from animals to 
humans, although the functional significance of this loss, if any, is not 
yet clear. 

In all cases examined, through natural or engineered mutants, ac¬ 
cessory protein genes have been found to be nonessential for viral 
replication in tissue culture. This dispensability has been determined 
for the 2a and HE genes of MHV (de Haan et al ., 2002a; Schwarz et al., 
1990), genes 4 and 5a of MHV (de Haan et al ., 2002a; Weiss et al ., 1993; 
Yokomori and Lai, 1991), the 1 gene of MHV (Fischer et al., 1997a), 
gene 7 of TGEV (Ortego etal., 2003), genes 7a and 7b of FIPV (Haijema 
et al., 2003, 2004), and genes 5a and 5b of LBV (Casais et al., 2005; 
Youn et al., 2005). Similarly, some accessory protein genes do not seem 
to play any role in infection of the natural host. For gene 4 (Ontiveros 
et al., 2001) and the I gene (Fischer et al., 1997a) of MHV, and for gene 
7b of FIPV (Haijema et al., 2003), selective knockout produced no 
detectable effect on pathogenesis in mice or cats, respectively. By 
contrast, disruption of gene 7 of TGEV greatly reduced viral replica¬ 
tion in the lung and gut of infected piglets (Ortego et al., 2003). In the 
same manner, viruses with knockouts of either the 3abc gene cluster or 
genes 7a and 7b in FIPV produced no clinical symptoms in cats at 
doses that were fatal with wild-type virus (Haijema et al., 2004). The 
deletion of genes 2a and HE, or of genes 4 and 5a, in MHV completely 
abrogated the lethality of intracranial infection in mice (de Haan et al., 
2002a). Even a single point mutation in MHV ORF 2a, which had 
no effect in tissue culture, was found to greatly attenuate virulence 
in vivo (Sperry et al., 2005). In a study that took the opposite approach 
to assessing accessory protein function, it was discovered that engi¬ 
neered insertion of gene 6 of SARS-CoV greatly enhanced the virulence 
of an attenuated variant of MHV (Pewe et al., 2005). 

The most extensively characterized accessory protein is HE (formerly 
called E3), which is a fourth constituent of the membrane envelope in 
many group 2 coronaviruses (Brian et al., 1995). HE forms a second set 
of small spikes that appear as an understory among the tall S protein 
spikes. It was first identified as a hemagglutinin in HEV (Callebaut and 
Pensaert, 1980) and BCoV (King and Brian, 1982; King et al., 1985). The 
HE monomer has an N-exo, C-endo transmembrane topology, with an 
amino-terminal signal peptide, a large ectodomain, a transmembrane 
anchor, and a very short, carboxy-terminal endodomain. Monomers of 
HE, prior to glycosylation are 48 kDa; this size increases to 65 kDa 
after addition and processing of oligosaccharide, which is exclusively 
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N-linked (Hogue et al., 1989; Kienzle et al., 1990; Yokomori et al., 
1989). The mature protein is a homodimer that is stabilized by both 
intrachain and interchain disulfide bonds (Hogue et al., 1989). The 
hemagglutinating property of HE raised the possibility that, in the 
viruses in which it appears, this protein may duplicate or replace the 
role that is assigned to the coronavirus S protein. However, it has been 
shown, through the construction of MHV-BCoV chimeric viruses, that 
the BCoV HE protein, in the absence of BCoV S protein, is not suffi¬ 
cient for initiation of infection in tissue culture (Popova and Zhang, 
2002 ). 

The HE protein also contains an acetylesterase activity. This was 
originally discovered in BCoV and HCoV-OC43, where it was shown to 
be similar to the receptor-binding and receptor-destroying activity 
found in influenza C virus (Vlasak et al., 1988a, b). The nature of the 
esterase enzyme has subsequently been comprehensively studied and 
compared among a number of group 2 coronaviruses (Klausegger et al., 
1999; Regl et al ., 1999; Smits et al ., 2005). HE proteins of BCoV, HCoV- 
OC43, ECoV, and MHV strain DVIM were found to be sialate-9-O- 
acetylesterases. By contrast, HE proteins of RCoV, and MHV strains 
S and JHM were found to be sialate-4-O-acetylesterases. Surprisingly, 
the coronavirus HE gene is clearly related to the influenza C virus HA1 
gene (Luytjes et al., 1988). Equally remarkably, toroviruses also pos¬ 
sess a homolog of the HE gene but at a different genomic locus than 
where it appears in the group 2 coronaviruses (Comelissen et al., 
1997). This may be evidence of genetic trafficking among pairs of 
ancestors of these three viruses, as was originally proposed (Luytjes 
et al., 1988; Snijder et al., 1991). Alternatively, it may indicate that 
members of different virus families independently acquired the HE 
gene by horizontal transfer from cellular sources (Comelissen et al., 
1997). 

There are two ways in which HE could act in corona vims replication. 
It could serve as a cofactor for S, assisting attachment of virus to host 
cells. Additionally, it could prevent aggregation of progeny virions and 
travel of virus through the extracellular mucosa (Comelissen et al., 
1997). The role of HE protein in coronavims infection has been sys¬ 
tematically documented in a recent pair of elegant studies (Kazi et al., 
2005; Lissenberg et al., 2005). To evaluate the cost and benefit of the 
HE gene, three isogenic MHV mutants were engineered: HE + , with an 
expressed and functional HE gene; HE 0 , with an expressed HE gene 
that was inactive, owing to active site point mutations; and HE - , 
which lacked HE expression because of an introduced frameshift. It 
was demonstrated that, following multiple passages, there was rapid 
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loss of HE expression in the HE + virus. Moreover, competition experi¬ 
ments showed a growth advantage for the HE - virus, but not the HE 0 
virus. Consistent with this, examination of esterase-negative mutants 
arising from the HE + virus showed that it was not loss of activity, but, 
rather, loss of the ability of HE to be incorporated into virions that 
correlated with the growth advantage of HE - viruses (Lissenberg 
et al ., 2005). By contrast, in infections of mice, it was found that the 
presence of HE (whether or not it was enzymatically active) dramati¬ 
cally enhanced neurovirulence, as measured by viral spread and le¬ 
thality (Kazi et al ., 2005). These results imply that sialic acid—bearing 
coreceptors can function to influence the course of MHVinfection. Thus, 
the HE protein is a burden in vitro but provides an advantage to the 
virus in vivo. The selection against HE in vitro provides a cautionary 
example that tissue culture adaptation of a virus can rapidly lead 
to selection of a variant that differs from the natural isolate. 


IV. Viral Replication Cycle and Virion Assembly 

Coronavirus infections are initiated by the binding of virions to cellu¬ 
lar receptors (Fig. 5). This sets off a series of events culminating in the 
deposition of the nucleocapsid into the cytoplasm, where the viral ge¬ 
nome becomes available for translation. The positive-sense genome, 
which also serves as the first mRNA of infection, is translated into the 
enormous replicase polyprotein. The replicase then uses the genome as 
the template for the synthesis, via negative-strand intermediates, of 
both progeny genomes and a set of subgenomic mRNAs. The latter 
are translated into structural proteins and accessory proteins. The 
membrane-bound structural proteins, M, S, and E, are inserted into 
the ER, from where they transit to the endoplasmic reticulum-Golgi 
intermediate compartment (ERGIC). Nucleocapsids are formed from 
the encapsidation of progeny genomes by N protein, and these coalesce 
with the membrane-bound components, forming virions by budding 
into the ERGIC. Finally, progeny virions are exported from infected 
cells by transport to the plasma membrane in smooth-walled vesicles, 
or Golgi sacs, that remain to be more clearly defined. During infection 
by some coronaviruses, but not others, a fraction of S protein that 
has not been assembled into virions ultimately reaches the plasma 
membrane. At the cell surface S protein can cause the fusion of an 
infected cell with adjacent, uninfected cells, leading to the formation of 
large, multinucleate syncytia. This enables the spread of infection 
independent of the action of extracellular virus, thereby providing 
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Fig 5. The coronavirus life cycle. 


some measure of escape from immune surveillance. Key aspects of 
the coronavirus replication cycle are discussed in more detail in the 
remainder of this section and in the next section (Section V). 


A. Receptors and Entry 

1. Receptors 

The pairings of coronaviruses and their corresponding receptors are 
generally highly species specific, but the adaptation of SARS-CoV to 
the human population has reminded us that this allegiance is mutable. 
Well prior to the emergence of SARS, it was clearly documented that 
another coronavirus, BCoV, was capable of sporadic cross-species 
transmission (Saif, 2004). Viruses very closely related to BCoV had 
been isolated from wild ruminants (Tsunemitsu et al ., 1995), domestic 
dogs (Erles et al ., 2003), and, in one case, a human child (Zhang et al ., 
1994). Nevertheless, the interaction between S protein and receptor 
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remains the principal, if not sole, determinant of coronavirus host 
species range and tissue tropism. At the cellular level, this has been 
demonstrated by manipulation of each of the interacting partners. 
First, expression of an identified receptor in nonpermissive cells, often 
of a heterologous species, invariably has rendered those cells permis¬ 
sive for the corresponding coronavirus (Delmas et al. , 1992; Dveksler 
et al., 1991; Li et al., 2003, 2004; Mossel et al., 2005; Tresnan et al., 
1996; Yeager et al., 1992). Second, the engineered swapping of S pro¬ 
tein ectodomains has been shown to change the in vitro host cell 
species specificity of MHV to that of FIPV (Kuo et al. , 2000) or, con¬ 
versely, of FIPV to that of MHV (Haijema et al., 2003). Similarly, 
exchange of the relevant regions of S protein ectodomains was shown 
to transform a strictly respiratory isolate of TGEV into a more virulent, 
enterotropic strain (Sanchez et al. , 1999). Replacement of the S protein 
ectodomain of MHV strain A59 caused the virus to acquire the highly 
virulent neurotropism of MHV strain 4 (Phillips et al. , 1999) or the 
highly virulent hepatotropism of MHV strain 2 (Navas et al. , 2001). 

Table III lists the known cellular receptors for coronaviruses of 
groups 1 and 2; to date no receptors have been identified for corona- 
viruses of group 3. Group 2 coronavirus receptors include the earliest 
and the most recent of the items in Table III. The MHV receptor 
(formerly MHVR1, now called mCEACAMl) is a member of the carci- 
noembryonic antigen (CEA) family, a group of proteins within the 
immunoglobulin (Ig) superfamily. CEACAM1 was the first receptor 
discovered for a coronavirus, and, indeed, it was one of the first recep¬ 
tors found for any virus (Williams et al. , 1990, 1991). Cloning of cDNA 
to the largest mRNA for this protein revealed that full-length CEA- 
CAM1 has four Ig-like domains (Dveksler et al. , 1991), but a number of 
two- and four-domain versions of the molecule were later found to 
be expressed in mouse cells. This diversity of MHV receptor isoforms 
was found to be generated by multiple alleles of the Ceacaml gene 
as well as by the existence of multiple alternative splicing variants 
of its mRNA (Compton, 1994; Dveksler et al. , 1993a,b; Ohtsuka and 
Taguchi, 1997; Ohtsuka et al. , 1996; Yokomori and Lai, 1992). The wide 
range of pathogenicity of MHV in mice is therefore thought to result 
from the interactions of S proteins of different virus strains with the 
tissue-specific spectra of receptor variants displayed in mice having 
different genetic backgrounds. A number of lines of evidence argue 
that CEACAM1 is the only biologically relevant receptor for MHV. 
This was initially suggested by an early experiment showing that 
in vivo administration of a monoclonal antibody to CEACAM1 greatly 
enhanced the frequency of survival of mice subsequently given a lethal 
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TABLE III 
CoRONAvraus Receptors 

Group Virus species Receptor Reference 


2 


TGEV 

PRCoV 

FIPV 

FCoV 

CCoV 

HCoV-229E 

HCoV-NL63 

MHV 


BCoV 

SARS-CoV 


Porcine aminopeptidase N (pAPN) 
Porcine aminopeptidase N (pAPN) 
Feline aminopeptidase N (fAPN) 
Feline aminopeptidase N (fAPN) 
Canine aminopeptidase N (cAPN) 
Human aminopeptidase N (hAPN) 
Angiotensin-converting enzyme 
2 (ACE2) 

Murine carcinoembryonic 
antigen-related 
adhesion molecules 1 and 2* 
(mCEACAMl, mCEACAM2*) 
9-O-acetyl sialic acid 
Angiotensin-converting enzyme 
2 (ACE2) 

CD209L (L-SIGN) 


Delmas et al., 1992 
Delmas et al., 1994b 
Tresnan et al. , 1996 
Tresnan et al. , 1996 
Benbacer et al. , 1997 
Yeager etal., 1992 
Hofmann et al. , 2005 

Nedellec et al., 1994*; 
Williams et al. , 1991 


Schultze et al., 1991 
1a et al., 2003 

Jeffers et al., 2004 


The mCEACAM2 molecule functions as a weak MHV receptor in tissue culture but 
does not serve as an alternate receptor in vivo (Hemmila et al. , 2004). 


challenge of MHV (Smith et al., 1991). More definitively, it was demon¬ 
strated that homozygous Ceacaml knockout mice were totally resis¬ 
tant to infection by high doses of MHV (Hemmila et al., 2004). Thus, 
even though CEACAM2, the product of the other murine Ceacam 
gene family member, can function as a weak MHV receptor in tissue 
culture (Nedellec et al., 1994), it cannot be used as an alternative 
receptor in vivo. 

Initial studies of the structural requirements for CEACAM1 func¬ 
tion showed that the molecule must be glycosylated in order to be 
functional as an MHV receptor (Pensiero et al., 1992). Moreover, the 
amino-terminal Ig-like domain was found to be the part of the molecule 
that is bound both by MHV S protein and by the monoclonal antibody 
originally used to identify the receptor (Dveksler et al., 1993b). The 
essential difference between high-affinity and low-affinity S binding 
receptor alleles has been mapped to a determinant as small as 
six amino acid residues on the amino-terminal domain (Rao et al., 
1997; Wessner et al., 1998). These critical residues, it turns out, 
fall within a prominent, uniquely convoluted loop in the recently 
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solved x-ray crystallographic structure for a two-Ig-domain isoform 
of CEACAM1 (Tan et al., 2002). Notably, this loop was found to 
be topologically similar to protruding loops of the virus-binding do¬ 
mains of the receptors for rhinoviruses, HIV, and measles, all of which, 
like CEACAM1, are cell adhesion molecules. The CEACAM1 struc¬ 
ture now provides the basis for beginning to understand the relative 
affinities of receptor variants for different S protein ligands. 

Other group 2 coronaviruses use different receptors. The rat coro- 
naviruses RCoV and SDAV, although closely related to MHV and able 
to grow in some of the same cell lines as does MHV, do not gain entry to 
cells via mCEACAMl. Anti-CE AC AM 1 monoclonal antibody, which 
totally blocks MHV infection, was shown to have no effect on infection 
by rat coronaviruses; moreover, expression of mCEACAMl in nonper- 
missive BHK cells rendered them susceptible to MHV but not to rat 
coronaviruses (Gagneten et al ., 1996). BCoV is phylogenetically close 
to MHV, but the two viruses neither share common hosts nor are they 
supported by any of the same cell lines in tissue culture. To date, the 
only identified cell attachment factor for BCoV is 9-O-acetyl sialic acid 
(Schultze et al., 1991), but it is not yet clear whether this moiety must 
be linked to specific proteins or glycolipids or whether there is also a 
specific cellular protein receptor for BCoV. 

Not surprisingly, SARS-CoV, which is phylogenetically most distant 
from all other group 2 coronaviruses, uses a receptor wholly unrelated 
to CEACAMs. The SARS-CoV receptor, which was found in remarkably 
short order after the discovery of the virus, is angiotensin-converting 
enzyme 2 (ACE2). This was identified through the use of a SARS-CoV 
Sl-IgG fusion protein to immunoprecipitate membrane proteins from 
Vero E6 cells, an African green monkey kidney cell fine that is the best 
in vitro host for SARS-CoV (Li et al., 2003). Binding of Sl-IgG to Vero 
E6 cells was inhibited by soluble ACE2 protein but not by a related 
protein, ACE1. Expression of cloned cDNA for ACE2 was then shown 
to render nonpermissive cells susceptible to infection by SARS-CoV 
(Li et al., 2003). ACE2 was also identified by expression cloning of 
an S 1-binding activity, and it was shown to render cells infectable 
by a retroviral pseudotype carrying the SARS-CoV S protein (Wang 
et al., 2004). 

ACE2 is a zinc-binding carboxypeptidase that is involved in regula¬ 
tion of heart function. It is an N-exo, C-endo transmembrane glyco¬ 
protein with a broad tissue distribution. Active-site mutants of 
ACE2 showed no detectable defects in binding to SARS-CoV S protein 
(Moore et al., 2004) or in promoting S protein-mediated syncytia for¬ 
mation (Li et al ., 2003), suggesting that ACE2 catalytic activity is not 
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required for receptor function. This conclusion needs to be verified by 
direct SARS-CoV infection, however. Recently solved x-ray structures 
for ACE2 have revealed that a large conformational change is induced 
by the binding of an inhibitor in the active site of the enzyme (Towler 
et al., 2004). Although this finding raised the possibility of a means to 
interfere with the initiation of infection, the inhibitor does not affect 
S protein binding or receptor function of ACE2 (Li et al. , 2005a). 

Numerous cell lines from a range of species have been classified with 
respect to their permissivity or nonpermissivity to SARS-CoV (Gillim- 
Ross et al., 2004; Giroglou et al., 2004; Mossel et al., 2005), thereby 
allowing inferences as to which species homologs of ACE2 could have 
some degree of SARS-CoV receptor activity. In direct tests of SI bind¬ 
ing, human ACE2 was shown to be a much better receptor than was 
mouse ACE2; the receptor activity of rat ACE2, however, was barely 
detectable above background (Li et al., 2004). In all cases tested, 
nonpermissive cells were shown to be made permissive by expression 
of human ACE2 (Mossel et al. , 2005). The full picture of factors influ¬ 
encing SARS-CoV host and tissue tropism is still developing. Human 
CD209L (also called L-SIGN or DC-SIGNR), a lectin family member, 
has been found to act as a second receptor for SARS-CoV, but it has 
much lower efficiency than does ACE2 (Jeffers et al., 2004). A related 
lectin, DC-SIGN, was identified as a coreceptor, since it was able to 
transfer the virus from dendritic cells to susceptible cells; DC-SIGN 
could not act as receptor on its own, however (Marzi et al., 2004; Yang 
et al., 2004). 

Many group 1 corona viruses use the aminopeptidase N (APN) of 
their cognate species as a receptor (Table HI) (Delmas et al., 1992; 
Tresnan et al., 1996; Yeager et al., 1992). APN (also called CD13) is a 
cell-surface, zinc-binding protease that contributes to the digestion of 
small peptides in respiratory and enteric epithelia; it is also found in 
human neural tissue that is susceptible to HCoV-229E (Lachance 
et al., 1998). The APN molecule is a homodimer; each monomer has a 
C-exo, N-endo membrane orientation and is heavily glycosylated. Com¬ 
petition experiments with monoclonal antibodies suggested that there 
is some overlap between the catalytic domain of hAPN and the binding 
site for HCoV-229E (Yeager et al., 1992). However, neither the use of 
specific APN inhibitors, nor the mutational disruption of the catalytic 
site of pAPN, affected its TGEV receptor activity, indicating that 
the enzymatic activity of APN, per se, is not required for initiation of 
infection (Delmas et al., 1994a). In general, the receptor activities 
of APN homologs are not interchangeable: hAPN cannot act as a receptor 
for TGEV (Delmas et al., 1994a), and pAPN cannot act as a receptor for 
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HCoV229E (Kolb et al., 1996). Curiously, fAPN can serve as a receptor not 
only for FIPV but also for CCoV, TGEV, and HCoV-229E (Tresnan et al., 

1996) . These contrasting properties have been used as the framework for 
dissecting the basis of species-specific or -nonspecific function, through 
the construction and analysis of chimeric receptors (Benbacer et al., 
1997; Delmas et al., 1994a; Hegyi and Kolb, 1998; Kolb et al., 1996, 

1997) . However, chimera construction has not revealed a single linear 
determinant for virus binding. Rather, two different regions of the mole¬ 
cule have been found to influence receptor activity with respect to a given 
coronavirus. A detailed study of one of these regions showed that the 
critical characteristic in chimeras that exclude HCoV-229E is a particu¬ 
lar glycosylation site. HCoV-229E likely does not directly bind to this 
region of APN, but it is hindered from doing so in homologs that are 
glycosylated at this locus (Wentworth and Holmes, 2001). 

Not all group 1 coronaviruses use APN as a receptor, however. It has 
been proposed that one subset of FIPV strains uses a different recep¬ 
tor, since an antibody to fAPN blocked replication of type II strains of 
FIPV but not replication of type I strains of FIPV (Hohdatsu et al., 

1998) . This conclusion is consistent with the observation that there is 
greater sequence divergence between type I FIPV S proteins and type 
II FIPV S proteins than there is between type II FIPV S proteins and 
the S proteins of CCoV or TGEV (Herrewegh et al ., 1998; Motokawa 
et al., 1996). Likewise, although it has been suggested that pAPN can 
facilitate cellular entry of PEDV (Oh et al., 2003), the major receptor 
for PEDV probably differs from that for TGEV, since the two viruses 
are able to grow in mutually exclusive sets of cells lines derived from 
different species (Hofmann and Wyler, 1988). The most outstanding 
exception to the generality of APN as a receptor for group 1 corona- 
viruses is the discovery that HCoV-NL63 cannot use hAPN to initiate 
infection; instead it is able to employ the same receptor as SARS-CoV, 
namely ACE2 (Hofmann et al ., 2005). This finding raises very interest¬ 
ing questions, one of which is why HCoV-NL63 causes a much milder 
respiratory disease than does SARS-CoV. Another is why two very 
different, zinc-binding, cell-surface peptidases, APN and ACE2, should 
serve as receptors for such a substantial number of coronaviruses. This 
situation can currently be ascribed to an amazing coincidence, but it 
may later be found to have deeper significance. 

2. Receptor Recognition 

The more variable of the two portions of the spike molecule, SI, is 
the part that binds to the receptor. Binding leads to a conformational 
change that results in the more highly conserved portion of the spike 
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molecule, S2, mediating fusion between virion and cell membranes. 
Just as different coronaviruses can bind to different receptors, corona- 
viruses also appear to use different regions of SI with which to do so. 
Receptor-binding domains (RBDs) have so far been mapped in four 
S proteins (Fig. 2). In the group 1 coronavirus TGEV, the RBD was 
localized to amino acids 579-655, a region highly conserved among the 
S proteins of TGEV, PRCoV, FIPV, FCoV, and CCoV (Godet et al., 
1994). For the more distantly related group 1 coronavirus HCoV- 
229E, the RBD was found to fall in an adjacent, nonoverlapping 
segment of SI, amino acids 417-547 (Bonavia et al ., 2003). By contrast, 
the RBD of MHV was localized to the amino terminus of the S mole¬ 
cule, amino acids 1-330 (Kubo et al ., 1994; Suzuki and Taguchi, 1996; 
Taguchi, 1995). Finally, the RBD of SARS-CoV was mapped to amino 
acids 270-510 or 303—537 by binding of S protein fragments to Vero 
cells (Babcock et al ., 2004; Xiao et al ., 2003). These loci were contained 
within a domain shown to harbor the epitope for a neutralizing single¬ 
chain antibody fragment that blocked SI association with the ACE2 
receptor (Sui et al., 2004). The SARS-CoV RBD was more finely delim¬ 
ited, to amino acids 318-510, by analysis of the binding to ACE2 of a 
large set of SI constructs (Wong et al ., 2004). Thus, on a linear map of 
S proteins aligned principally by their S2 domains, the MHV RBD falls 
near the amino end of SI, the SARS-CoV RBD is in the middle of SI, 
and the TGEV and HCoV-229E RBDs fall near the carboxyl end of SI. 
The complementarity of the MHV and TGEV RBD loci is further 
emphasized by the fact that substantial deletions are tolerated in 
TGEV SI in the region that corresponds to the MHV RBD (Laude 
et al ., 1995). Conversely, substantial deletions are tolerated in MHV 
SI in the region that corresponds to the TGEV RBD (Parker et al., 
1989; Rowe et al., 1997). 

For MHV, persistent infection in tissue culture was shown to lead to 
the selection of variant viruses with an extended host range (Baric 
et al., 1997,1999; Schickli et al., 1997). These viruses gained the ability 
to grow in cell lines from numerous species not permissive to wild-type 
MHV through an acquired recognition of receptors other than CEA- 
CAM1. Analysis and engineered reconstruction of one of these selected 
variants showed that a relatively small number of amino acid changes 
in the S protein RBD accounted for its extended host range (Schickli 
et al., 2004; Thackray and Holmes, 2004). Comparison of the RBDs of 
various strains of MHV, of the extended host range mutant of MHV, 
and of other group 2 coronaviruses allowed the identification of five 
residues in the RBD that were uniquely conserved among MHV 
strains (Thackray et al., 2005). Mutations in some of these residues 
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were lethal or resulted in viruses that formed very small plaques; in 
particular, a tyrosine at position 162 of the RBD was proposed as a 
candidate element in a key interaction with the receptor. 

A set of elegant studies with the SARS-CoV S protein and ACE2 has 
provided the most detailed image of RBD-receptor interactions yet 
available for any coronavirus. Aided by the x-ray structure of ACE2, 
Li et al. (2005a) used the rat ACE2 molecule, which has negligible 
receptor activity, as a scaffold to identify critical residues in human 
ACE2. Transfer of as few as four human ACE 2 residues to rat ACE2 
enabled the latter to bind S protein almost as well as human ACE2 did. 
A similar approach was used to determine key SI residue changes that 
allowed the interspecies jump of SARS-CoV. The SI domains of two 
SARS-CoV isolates were compared in this analysis: one (TOR2) from 
the main 2002-2003 SARS outbreak, and one (GD) from the 
subsequent 2003—2004 outbreak; the latter outbreak was much less 
severe and did not include any human-to-human transmission. Both 
the TOR2 and GD viruses are thought to have been transmitted to 
humans from palm civets, the final intermediary host in the jump of 
SARS-CoV from an unknown natural reservoir. However, only the 
TOR2 virus efficiently adapted to humans. Correspondingly, it was 
found that the SI domains of both the TOR2 and GD viruses bound 
to palm civet ACE2, but only TOR2 SI bound to human ACE2 (Li et al ., 
2005a). Binding experiments with numerous chimeric variants were 
used to chart precisely which of the multiple coordinated changes in 
both the SI RBD and in the human and palm-civet ACE2 could account 
for differences in the mutual affinities of the two molecules. The basis 
for the results that were obtained was then deduced from the x-ray 
structure of human ACE2 in a complex with the SARS-CoV S protein 
RBD (Li et al., 2005b). The RBD was found to bind to the amino- 
terminal, catalytic domain of ACE2, contacting the latter with a 
concave, 71-residue loop. Inspection of the interface of this contact 
revealed that an astonishingly small number of RBD amino acid 
changes were critical to the adaptation of the virus from one species 
homolog of ACE2 to another. A change as subtle as the gain of a methyl 
group (serine to threonine at residue 487 of the RBD) that fits into a 
hydrophobic pocket on the receptor could account for a 20-fold increase 
in affinity of SI for human ACE2. 

3. S Protein Conformational Change and Fusion 

The binding of spike to its cellular receptor triggers a major confor¬ 
mational change in the S molecule. In some cases, induction of this 
conformational change may also require a shift to an acidic pH. Thus, 
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some corona viruses, such as MHV, fuse with the plasma membrane at 
the cell surface (Sturman et al., 1990; Weismiller et al., 1990), while 
others, such as TGEV (Hansen et al., 1998), HCoV-229E (Nomura 
et al., 2004), and SARS-CoV (Hofmann et al., 2004; Simmons et al., 
2004; Yang et al., 2004), appear to enter the cell via receptor-mediated 
endocytosis and then fuse with the membranes of acidified endosomes. 
There may be a very fine balance between these two states. For MHV, 
it was found that as few as three amino acid changes in a heptad repeat 
region in S2 could govern the switch from plasma membrane fusion to 
strictly acid pH-dependent fusion (Gallagher et al., 1991; Nash and 
Buchmeier, 1997). For SARS-CoV, protease treatment of cells at the 
earliest steps of infection was found to allow the virus to enter 
cells from the surface, rather than through an endocytic pathway 
(Matsuyama et al., 2005). Such treatment enhanced the infectivity of 
the virus by orders of magnitude, and this enhancement was receptor 
dependent. Although SARS-CoV S protein is not detectably cleaved 
in virions or pseudovirions produced in tissue culture (Simmons 
et al., 2004; Song et al., 2004), protease treatment may mimic the 
environment resulting from an inflammatory response in infected 
lungs. 

Much of the characterization of the receptor-induced conformational 
change in S was initially carried out with the MHV S protein, for which 
it was found that the effects of receptor binding could also be elicited by 
treatment of virions at mild alkaline pH (Sturman et al., 1990). Such 
treatment caused the dissociation and release of the cleaved SI sub¬ 
unit and the aggregation of S2 subunits; the accompanying conforma¬ 
tional changes in SI were monitored by differential access of a panel of 
monoclonal antibodies at neutral and alkaline pH (Weismiller et al., 
1990). Disulfide bond formation plays an important role in S protein 
folding, and disulfides in SI may become rearranged during the con¬ 
formational transitions of SI following receptor binding (Lewicki 
and Gallagher, 2002; Opstelten et al., 1993; Sturman et al., 1990). 
The S protein of the highly virulent MHV strain 4 (JHM) has 
been shown to exist in a particularly metastable configuration. This 
results in a hair-trigger spike so highly fusogenic that it can mediate 
fusion between infected cells and cells lacking receptors, thereby 
leading to more extensive neuropathogenesis than occurs with other 
MHV strains (Gallagher and Buchmeier, 2001; Gallagher et al., 1992; 
Krueger et al., 2001; Nash and Buchmeier, 1996). 

In the normal spike-receptor interaction, both the Sl-binding and 
the S 1-activation functions were found to reside in the amino-terminal 
Ig domain of CEACAM1 (Miura et al., 2004). The role of the additional 
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Ig domain(s) in the various CEACAM isoforms is apparently to give the 
virus access to the amino-terminal Ig domain. Similarly, although the 
RBD of the MHV S protein lies near the amino terminus of SI, portions 
of the molecule distal to this site can significantly influence the stability 
of the Sl-receptor interaction (Gallagher, 1997). The conformational 
change that separates SI from the rest of the molecule, in turn, trans¬ 
mits a major change to S2. This secondary change has been monitored 
by the differential susceptibility of S2 to protease treatment before and 
after the binding of SI to soluble receptor (Matsuyama and Taguchi, 
2002). Additionally, the same changes were shown to be caused, in the 
absence of receptor, by mild alkaline pH, which induced a fusogenic 
state in S2 that could be measured by a liposome flotation assay (Zelus 
et al ., 2003). 

It has been realized that the coronavirus S protein is a type I viral 
fusion protein with functional similarities to the fusion proteins of 
phylogenetically distant RNA viruses such as influenza virus, HIV, 
and Ebola virus (Bosch et al., 2003). Similar to its counterparts in 
other viruses, the coronavirus S2 domain contains two separated hep- 
tad repeats, HR1 and HR2, with a fusion peptide upstream of HR1 and 
the transmembrane domain immediately downstream of HR2 (Fig. 2). 
Mutations in the MHV S protein HR1 and HR2 regions were shown to 
inhibit or abolish fusion (Luo and Weiss, 1998; Luo et al ., 1999). Unlike 
its counterparts, however, the coronavirus S protein does not require 
cleavage to be fusogenic, and it contains an internal fusion peptide, 
although the exact assignment of this domain is not agreed upon 
(Guillen et al., 2005; Sainz et al., 2005). Even for MHV S and other 
cleaved S proteins, the fusion peptide is not the amino terminus of S2 
created by cleavage (Luo and Weiss, 1998), as is the case in other type I 
fusion proteins. 

The receptor-mediated conformational change in SI and the dissoci¬ 
ation of SI from S2 are thought to initiate a major rearrangement in 
the remaining S2 trimer. This rearrangement exposes a fusion peptide 
that interacts with the host cellular membrane, and it brings together 
the two heptad repeats in each monomer so as to form an antiparallel, 
six-helix “trimer-of-dimers” bundle. The result is the juxtaposition of 
the viral and cellular membranes in sufficient proximity to allow the 
mixing of their lipid bilayers and the delivery of the contents of 
the virion into the cytoplasm. The trimer of dimers is extremely stable, 
forming a rod-like, protease-resistant complex, the biophysical proper¬ 
ties of which have been studied in depth for the S proteins of 
MHV (Bosch et al., 2003) and SARS-CoV (Bosch et al., 2003, 2004; 
Ingallinella et al., 2004; Liu et al., 2004; Tripet et al., 2004) by the use 
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of model peptides. X-ray crystallographic structures have been solved 
for peptide complexes for both the MHV S protein (Xu et al ., 2004a) and 
the SARS-CoV S protein (Duquerroy et al., 2005; Supekar et al., 2004; 
Xu et al ., 2004b). In the six-helix bundle, the three HR1 helices were 
found to form a central, coiled-coil core, and the three HR2 helices, in 
an antiparallel orientation, pack into the grooves between the HR1 
monomers. There is no contact between the HR2 monomers, each of 
which associates with the HR1 grooves through hydrophobic interac¬ 
tions. The overall structures obtained for MHV S and SARS-CoV S are 
highly similar to each other and strongly resemble the structures of 
the fusion cores of influenza virus HA and HIV gp41. Noteworthy 
differences are that the coronavirus HR1 coiled-coil is two to three 
times larger than its counterparts in other viruses and that the much 
shorter coronavirus HR2 helices assume a unique conformation within 
the bundle. A major goal of these studies is the design of peptides that 
are able to inhibit formation of this complex in SARS-CoV infections. 

In addition to the mechanisms of the conformational rearrange¬ 
ments of SI and S2, other factors influence coronavirus fusion and 
entry, in ways that are not yet well understood. For two corona viruses, 
the role of cholesterol in virus entry has been investigated. Cholesterol 
supplementation was found to augment MHV replication, while cho¬ 
lesterol depletion was inhibitory; these effects were shown to occur at 
the earliest stages of infection (Thorp and Gallagher, 2004). Contrary 
to expectations, the basis for the action of cholesterol was not through 
clustering of CEACAM receptors into lipid rafts, either before or after 
the binding of virus to receptor (Choi et al ., 2005; Thorp and Gallagher, 
2004). However, cell-bound virions did cluster into lipid rafts, suggest¬ 
ing that MHV S protein associates with some host factor other than 
CEACAM prior to entry (Choi et al., 2005). For HCoV-229E, on the 
other hand, both virus and hAPN receptor were found to redistribute 
on the cell surface from an initially disperse pattern to clusters within 
caveolin- 1-rich lipid rafts (Nomura et al., 2004). Thus, the mechanism 
by which cholesterol assists infection may differ between corona- 
viruses that enter the cell via receptor-mediated endocytosis and those 
that fuse with the plasma membrane. 

For those coronaviruses that bring about syncytia formation, cell¬ 
cell fusion appears to have different requirements than virus-cell 
fusion. Studies with MHV have long noted a correlation between the 
degree of S protein cleavage and the amount of cell-cell fusion, both of 
which could be enhanced by trypsin treatment (Sturman et al ., 1985). 
The extent and kinetics of S protein cleavage were shown to vary 
among different cell lines, implicating the involvement of a cellular, 
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rather than viral, protease (Frana et al., 1985). Consistent with this, 
an MHV strain A59 mutant isolated from persistently infected glial 
cells was found to have an altered cleavage site, RRADR instead of 
the wild-type RRAHR (Gombold et al., 1993); this change caused an 
extreme delay, but not abrogation, of fusion of infected cells. Studies of 
expressed MHV S proteins with wild-type or mutated cleavage sites 
gave essentially the same results, showing that the fusion delay was 
strictly a property of mutant S protein (Bos et al., 1995; Stauber et al., 
1993; Taguchi, 1993). However, S protein was found not to be cleaved 
at all in MHV-infected primary glial cells or hepatocytes, indicating 
that cleavage was not a requirement for virus-cell fusion (Hingley 
et al., 1998). It was demonstrated that furin or a furin-like protease 
is responsible for MHV S cleavage in tissue culture (de Haan et al., 
2004). Treatment of cells with a specific furin inhibitor blocked both 
cleavage and cell-cell fusion, but it had no effect on virus-cell fusion. 

Another component of the MHV S protein that operates in cell-cell 
fusion is the cysteine-rich region of the endodomain, mutation of which 
delays or abrogates syncytia formation (Bos et al., 1995; Chang et al., 
2000). It is currently not known how this segment of the S molecule, 
which is on the opposite side of the membrane from the six-helix 
bundle, participates in the fusion process. The cysteine-rich region of 
the endodomain is a possible target for palmitoylation (Bos et al., 
1995), which is a known modification of MHV S (Niemann and Klenk, 
1981), but, as yet, a role for palmitoylation has not been established. 


B. Virion Assembly Interactions 

Once the full program of viral gene expression is underway, through 
transcription, translation, and genome replication, progeny viruses 
can begin to assemble. Coronavirus virion assembly occurs through a 
series of cooperative interactions that occur in the ER and the ERGIC 
among the canonical set of structural proteins, S, M, E, and N. The M 
protein is a party to most, if not all, of these interactions and has come 
to be recognized as the central organizer of the assembly process. 
Despite its dominant role, however, M protein alone is not sufficient 
for virion formation. Independent expression of M protein does not 
result in its assembly into virion-like structures. Under these circum¬ 
stances, M was shown to traverse the secretory pathway as far as the 
irans-Golgi (Klumperman et al., 1994; Machamer and Rose, 1987; 
Machamer et al., 1990; Rottier and Rose, 1987; Swift and Machamer, 
1991), where it forms large, detergent-insoluble complexes (Krijnse 
Locker et al., 1995; Weisz et al., 1993). By contrast, MHV, IBV, TGEV, 
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and FIPV, representative species from each of the three coronavirus 
groups, were found to bud into a proximal compartment, the ERGIC 
(Klumperman et al., 1994; Krijnse Locker et al., 1994; Tooze et al., 
1984,1988). These observations suggested that some factor, in addition 
to M, must determine the site of virion assembly and budding. 

The identification of the unknown factor came from the development 
of virus-like particle (VLP) systems for coronaviruses. Such studies 
showed that, for MHV, coexpression of both M protein and the minor 
virion component, E protein, was necessary and sufficient for the 
formation of particles (Bos et al., 1996; Vennema et al., 1996). The 
resulting VLPs were morphologically identical to virions (minus 
spikes) and were released from cells by a pathway similar to that used 
by virions. Notably, neither the S protein nor the nucleocapsid was 
found to be required for VLP formation. These results were subse¬ 
quently generalized for coronaviruses from all three groups: BCoV 
and TGEV (Baudoux et al., 1998), IBV (Corse and Machamer, 2000, 
2003), and SARS-CoV (Mortola and Roy, 2004). Currently, there is one 
known exception to this trend: in a separate study of SARS-CoV, M and 
N proteins were reported to be necessary and sufficient for VLP forma¬ 
tion, whereas E protein was dispensable (Huang et al., 2004a). This 
latter contradiction remains to be resolved. It may reflect a unique 
aspect of SARS-CoV virion assembly, or, alternatively, it may indicate 
that VLP requirements can vary with different expression systems. 

1. M Protein—M Protein Interactions 

Since VLPs contain very little E protein, it is assumed that lateral 
interactions between M protein monomers are the driving force for 
virion envelope formation. These interactions have been explored 
through examination of the ability of constructed M protein mutants 
to support or to interfere with VLP formation. A study that tested the 
structural requirements of the M protein found that mutations either 
in the ectodomain, or in any of the three transmembrane domains, or 
in the carboxy-terminal endodomain, could inhibit or abolish VLP 
formation (de Haan et al., 1998a). In particular, the carboxy terminus 
of M was extremely sensitive to small deletions or even to point muta¬ 
tions of the final residue of the molecule. Construction of many of these 
latter mutations in the viral genome revealed a consistent set of effects 
on viral viability. Yet, virions were better able than VLPs to tolerate 
carboxy-terminal alterations in M protein, presumably because virions 
were stabilized by additional intermolecular interactions not present 
in VLPs. In experiments in which both wild-type and mutant M pro¬ 
teins were coexpressed with E protein, wild-type M protein was able to 
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rescue low concentrations of assembly-defective mutant M proteins 
into VLPs (de Haan et al., 1998a). This finding, coupled with results 
from coimmunoprecipitation analyses, provided the basis for further 
work, which concluded that monomers of M interact via multiple con¬ 
tacts throughout the molecule and particularly in the transmembrane 
domains (de Haan et al. , 2000). 

2. S Protein—M Protein Interactions 
That VLPs could be formed in the absence of S protein (Bos et al., 
1996; Vennema et al. , 1996) confirmed the much earlier discovery that 
treatment of MHV-infected cells with the glycosylation inhibitor tuni- 
camycin led to the assembly and release of spikeless (and consequently, 
noninfectious) virions (Holmes et al., 1981; Rottier et al., 1981). These 
findings were also consistent with the properties of certain classical 
temperature-sensitive mutants of MHV and IBV, which, owing to S 
gene lesions, failed to incorporate spikes into virions at the nonpermis- 
sive temperature (Luytjes et al., 1997; Ricard et al., 1995; Shen et al., 
2004). Independently expressed MHV, FIPV, or IBV S proteins enter 
the default secretory pathway and ultimately reach the plasma mem¬ 
brane (Vennema et al., 1990). In the presence of M protein, however, a 
major fraction of S is retained in intracellular membranes, as was 
shown by coimmunoprecipitation of S and M proteins from MHV- 
infected cells (Opstelten et al., 1995). Moreover, the interaction of 
M with S was demonstrated to be specific; complexes of M did not 
impede the progress of a heterologous glycoprotein (the VSV G protein) 
to the plasma membrane. Additionally, kinetic experiments revealed 
that the folding and oligomerization of S protein in the ER is rate 
limiting in the M-S interaction, in which nascent M protein immedi¬ 
ately participates (Opstelten et al., 1995). Complexes of the M and S 
proteins were similarly observed in BCoV-infected cells, for which it was 
found that M also determines the selection of HE protein for incorpora¬ 
tion into virions (Nguyen and Hogue, 1997). The simplest picture to 
be drawn from all this evidence, then, is that S protein is entirely 
passive in assembly but becomes trapped by M protein upon passage 
through the ER. Nevertheless, there are indications that, in some cases, 
S cooperates in its own capture. By the criterion of acquisition of endo 
H resistance, independently expressed S protein was found to be trans¬ 
ported to the cell surface with much slower kinetics than S protein 
that was incorporated into virions. This led to the proposal that free 
S protein harbors intracellular retention signals that become hidden 
during virion assembly (Vennema et al., 1990). Such signals have been 
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found in the (group 3) IBV S protein cytoplasmic endodomain, which 
contains both a dilysine motif that was shown to specify retention in the 
ERGIC and a tyrosine-based motif that causes retrieval by endocytosis 
from the plasma membrane (Lontok et al., 2004). Additionally, a novel 
dibasic ERGIC retention signal was identified in the S protein endodo- 
mains of group 1 coronaviruses (TGEV, FIPV, and HCoV-229E) and 
SARS-CoV, but not other group 2 coronaviruses, such as MHV and 
BCoV. 

Although the S protein is not required for VLP formation, it does 
become incorporated into VLPs if it is coexpressed with the M and 
E proteins (Bos et al., 1996; Vennema et al., 1996). VLP manipulations 
thus made it possible to begin to dissect the molecular basis for the 
specific selection of S protein by M protein. As for M-M homotypic 
interactions, the sites within M protein that bind to S protein have not 
yet been pinpointed. On a broader scale, deletion mapping has indi¬ 
cated that the ectodomain of M protein and the carboxy-terminal 
25 residues of the endodomain do not participate in interactions with 
S, even though both of these regions are critical for VLP formation 
(de Haan et al., 1999). The residues of S protein that interact with 
M protein, on the other hand, have been much more precisely localized. 
This mapping began with the swapping of ectodomains between the 
very divergent S proteins of MHV and FIPV (Godeke et al., 2000). This 
type of exchange showed that the incorporation of S protein into VLPs 
of a given species was determined by the presence of merely the 
transmembrane domain and endodomain of S protein from the same 
species. The source of the S ectodomain did not matter. The assembly 
competence of the 1324-residue MHV S protein or the 1452-residue 
FIPV S protein was therefore restricted to just the 61-amino-acid, 
carboxy-terminal region of each of these molecules. That the domain- 
switched S molecules were completely functional was demonstrated by 
the construction of an MHV mutant, designated fMHV, in which the 
ectodomain of the MHV S protein was replaced by that of the FIPV S 
protein (Kuo et al., 2000). As predicted, this mutant gained the ability 
to grow in feline cells, while losing the ability to grow in mouse cells. 
The fMHV chimera provided the basis for powerful selections, based on 
host cell species restriction, that have been used with the reverse 
genetic system of targeted RNA recombination (Section VI) (Kuo and 
Masters, 2002; Masters, 1999; Masters and Rottier, 2005). The con¬ 
verse construct, an FIPV mutant designated mFIPV, in which the 
ectodomain of the FIPV S protein was replaced by that of the MHV 
S protein, had properties exactly complementary to those of fMHV 
(Haijema et al., 2003). 
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More detailed dissection of the transmembrane domain and endodo- 
main of the MHV S protein has been carried out to further localize the 
determinants of S incorporation into virions (Bosch et al. , 2005; Ye et al ., 
2004). In one study, the S protein transmembrane domain, or the en- 
dodomain, or both, were swapped with the corresponding region(s) of a 
heterologous transmembrane protein, which was expressed as an extra 
viral gene product (Ye et al., 2004). Mutations were constructed in this 
surrogate virion structural protein, or, alternatively, directly in the S 
protein. From this work, the virion assembly property of S was found to 
map solely to the 38-residue endodomain, with a major role assigned to 
the charge-rich, carboxy-terminal region of the endodomain. Additional¬ 
ly, it was observed that the adjacent, membrane-proximal, cysteine-rich 
region of the endodomain was critical for cell-cell fusion during infection, 
consistent with results previously reported from investigations using 
S protein expression systems (Bos et al., 1995; Chang et al., 2000). A 
second study, based on analysis of a progressive series of carboxy- 
terminal truncations of the S protein in VLPs and in viral mutants, also 
mapped the virion assembly competence of S to the endodomain (Bosch 
et al., 2005). In this work, however, the major role in assembly was 
attributed to the cysteine-rich region of the endodomain, and the overall 
size, rather than the sequence of the endodomain, was seen to be critical. 
Thus, the precise nature of the interaction between the S protein endo¬ 
domain and the M protein remains to be resolved. 

3. N Protein—M Protein Interactions 

The interaction of the viral nucleocapsid with M protein was origi¬ 
nally examined by the fractionation of purified MHV virions (Sturman 
et al., 1980). At 4°C, M protein was separated from other components 
on density gradient centrifugation of NP-40-solubilized virion prepara¬ 
tions, but M reassociated with the nucleocapsid when the temperature 
was elevated to 37°C. Further analysis suggested that, contrary to 
expectations, this temperature-dependent association was mediated 
by M binding to viral RNA, rather than to N protein. The notion of 
M protein as an RNA-binding protein has been revived in light of 
recent results on the mechanism of genome packaging (Section IV.C) 
(Narayanan et al., 2003a). 

For TGEV virions, the use of particular low-ionic-strength condi¬ 
tions of NP-40 treatment similarly resulted in the finding that a frac¬ 
tion of M protein was persistently integrated with subviral cores (Risco 
et al. , 1996). For assay of this association, in uiZro-translated M protein 
was bound to immobilized nucleocapsid purified from virions (Escors 
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et al., 2001). Through the combined approaches of deletion mapping, 
inhibition by antibodies of defined specificity, and peptide competition, 
the M-nucleocapsid interaction was localized to a segment of 16 resi¬ 
dues adjacent to the carboxy terminus of the 262-residue TGEV 
M protein. 

Studies of MHV have taken genetic avenues to explore the N 
protein-M protein interaction. In one report, a viral mutant was con¬ 
structed in which the carboxy-terminal two amino acids of the 228- 
residue MHV M protein were deleted (Kuo and Masters, 2002), a lesion 
previously known to abolish VLP formation (de Haan et al., 1998a). 
The resulting highly impaired virus, designated MA2, formed tiny 
plaques and grew to maximal titers many orders of magnitude lower 
than those of the wild type. Multiple independent second-site rever- 
tants of the MA2 mutant were isolated and mapped to either the 
carboxy terminus of M or that of N. Reconstruction of some of these 
compensating mutations, in the presence of the original MA2 muta¬ 
tion, provided evidence for a structural interaction between the car¬ 
boxy termini of the M and the N proteins. In a complementary 
analysis, a set of viral mutants were created containing all possible 
clustered charged-to-alanine mutations in the carboxy-terminal do¬ 
main 3 of the N protein (Hurst et al., 2005). One of the members of 
this set, designated N-CCA4, was extremely defective, having a phe¬ 
notype similar to that of the MA2 mutant. Multiple independent 
second-site suppressors of N-CCA4 were found to map in the 
carboxy-terminal region of either the N or the M protein, thereby 
reciprocating the genetic cross-talk uncovered with the MA2 mutant. 
Additionally, it was shown that the transfer of N protein domain 3 to a 
heterologous protein allowed incorporation of that protein into MHV 
virions. 

4. Role of E Protein 

In contrast to the more overt structural roles of the M, S, and N 
proteins, the part played by E protein in assembly is enigmatic. On 
discovery of the essential nature of E in VLP formation, it was specu¬ 
lated that the low amount of E protein in virions and VLPs indicated a 
catalytic, rather than structural, function for this factor. E protein 
might serve to induce membrane curvature in the ERGIC, or it might 
act to pinch off the neck of the viral particle in the final stage of the 
budding process (Vennema et al., 1996). In a search for evidence corre¬ 
lating the VLP findings to the situation in whole virions, a set of 
clustered charged-to-alanine mutations were constructed in the E gene 
of MHV. One of the resulting mutants was markedly thermolabile, and 
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its assembled virions had striking morphologic defects, exhibiting 
pinched and elongated shapes that were rarely seen among wild-type 
virions (Fischer et al., 1998). This phenotype clearly supported a criti¬ 
cal role for E protein in virion assembly. Surprisingly, however, it was 
later found to be possible to entirely delete the E gene from the MHV 
genome, although the resulting A E mutant virus was only minimally 
viable compared to the wild type (Kuo and Masters, 2003). This in¬ 
dicated that, for MHV, the E protein is important, but not absolutely 
essential, to virion assembly. By contrast, for TGEV, two independent 
reverse genetic studies showed that knockout of the E gene was lethal. 
Viable virus could be recovered only if E protein was provided in trans 
(Curtis et al., 2002; Ortego et al., 2002). This discordance may point to 
basic morphogenic differences between group 2 corona viruses (such 
as MHV) and group 1 coronaviruses (such as TGEV). Alternatively, it 
is possible that E protein has multiple activities, one of which is 
essential for group 1 coronaviruses but is largely dispensable for group 
2 coronaviruses. 

The information available about E protein at this time is not suffi¬ 
ciently complete to allow us to understand the function of this tiny 
molecule. One of the most intriguing questions is whether it is neces¬ 
sary for E protein to directly physically interact with M protein, or 
whether E acts at a distance. If E protein has multiple roles, then 
perhaps both of these possibilities are applicable. Direct interaction 
between the E and M proteins is implied by the observation that, at 
least in some cases, coexpression of E and M proteins from different 
species does not support VLP formation (Baudoux et al., 1998). The 
demonstration that IBV E and M can be cross-linked to one another 
also has established that the two proteins are in close physical proxim¬ 
ity in infected or transfected cells (Corse and Machamer, 2003). 
Contrary to this, some data appear to argue that E acts independently 
of M. The individual expression of MHV or IBV E protein results in 
membrane vesicles that are exported from cells (Corse and Machamer, 
2000; Maeda et al., 1999). Additionally, it has been shown that the 
expression of MHV E protein alone leads to the formation of clusters of 
convoluted membranous structures highly similar to those seen in 
coronavirus-infected cells (Raamsman et al., 2000). This suggests that 
the E protein, without other viral proteins, acts to induce membrane 
curvature in the ERGIC. Some indirect evidence may also be taken to 
indicate that E does not directly contact other viral proteins. Multiple 
revertant searches with E gene mutants failed to identify any suppres¬ 
sor mutations that map in M or in any gene other than E (Fischer 
et al., 1998). Similarly, none of the intergenic suppressors of the MA2 
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mutant mapped to the E gene (Kuo and Masters, 2002). It has been 
found that the SARS-CoV E protein forms cation-selective ion channels 
in a model membrane system (Wilson et al., 2004). Moreover, this 
channel-forming property was contained in the amino-terminal 40 
residues of the 76-residue SARS-CoV E molecule. Such an activity 
made be the basis for an independent mode of action of E protein. 

C. Genome Packaging 

Although a variety of positive- and negative-strand viral RNA spe¬ 
cies are synthesized during the course of infection (Section V), corona- 
viruses selectively incorporate genomic (positive-strand) RNA into 
assembled virions. This may be accomplished with varying degrees of 
stringency by different members of the family. Sucrose gradient- 
purified virions of MHV have been found to exclusively contain geno¬ 
mic RNA (Makino et al ., 1990). By contrast, similarly purified virions 
of BCoV (Hofmann et al., 1990), TGEV (Sethna et al., 1989,1991), and 
IBV (Zhao et al., 1993) have been reported to contain significant quan¬ 
tities of subgenomic mRNA, in some cases in molar amounts exceeding 
those of the genomic RNA. However, in a study of TGEV, in which 
virions were extensively purified by an ELISA-based immunopurifica- 
tion procedure, a very high degree of selectivity for genomic RNA 
packaging was observed (Escors et al., 2003). 

In those viruses in which it has been mapped, the RNA element that 
specifies selective packaging falls, as would be expected, in a region of 
the genome that is not found in any of the subgenomic mRNAs. In 
MHV, the genomic packaging signal was localized through analysis of 
defective interfering (DI) RNAs. DI RNAs are extensively deleted 
variants of the genome that propagate as molecular parasites, using 
the replicative machinery of a helper virus. Some DI RNAs are pack¬ 
aged efficiently, while others have lost such a capability. Dissection of 
particular members of the former class revealed that a relatively small 
span of internal sequence could account for packaging competence 
(Makino et al., 1990; van der Most et al., 1991). The exact boundaries 
of the MHV packaging signal are not precisely defined, but reports 
from different groups have converged on RNA segments of 180-190 nt, 
within a 220-nt region that is centered some 20.3 kb from the 5' end of 
the genome (Fosmire et al., 1992; Molenkamp and Spaan, 1997). The 
MHV packaging element is thus embedded in the coding sequence of 
nspl5, at the distal end of the replicase gene. A core 69-nt RNA 
secondary structural element can act as a minimal signal for packag¬ 
ing (Fosmire et al., 1992; Woo et al., 1997), but larger versions of the 
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element, consisting of the core plus flanking sequences, function more 
efficiently (Cologna and Hogue, 2000; Narayanan and Makino, 2001). 
Even the larger versions of the element may not be entirely sufficient, 
however: some data suggest that other cts-acting sequences found 
in genomic, but not subgenomic, DI RNA contribute to the overall 
efficiency of packaging (Bos et al., 1997). 

For the closely related group 2 coronavirus BCoV, the 190-nt geno¬ 
mic region homologous to the MHV packaging signal has been shown 
to have the same function as its MHV counterpart. Moreover, the MHV 
and BCoV packaging signals are able to act in a reciprocal fashion: a 
nonviral RNA containing the MHV packaging signal can be packaged 
by BCoV helper virus, and a nonviral RNA containing the BCoV 
packaging signal can be packaged by MHV helper virus (Cologna and 
Hogue, 2000). This functional homology does not appear to extend 
across group boundaries, though. For the group 1 coronavirus TGEV, 
the packaging signal was also shown to be retained in particular DI 
RNAs, which were found to be incorporated into defective virions that 
could be separated from helper virus by density gradient centrifuga¬ 
tion (Mendez et al., 1996). Surprisingly, dissection of the smallest 
packaged DI RNA revealed that the packaging signal for TGEV maps 
to the upstream end of the replicase gene, localizing in the region of 
100-649 nt from the 5' end of the genome (Escors et al ., 2003). For the 
group 3 coronavirus IBV, a packaged DI RNA has been isolated and 
characterized (Penzes et al., 1994), but mapping of the packaging 
element in this RNA has thus far been inconclusive, owing to the need 
to decouple requirements for replication from those for packaging 
(Dalton et al., 2001). Nevertheless, it is clear that the IBV DI RNA 
does not harbor a region of the IBV genome homologous to the region 
that contains the packaging signal in MHV. Similarly, the IBV DI RNA 
may also lack the counterpart of the TGEV packaging signal. It will be 
interesting to see whether the packaging signals of viruses in the three 
coronavirus groups, once they are completely characterized, are found 
to retain structural similarities despite differences in sequence and 
location. 

The mechanism by which packaging signals operate is not yet clear, 
and results with MHV have in fact taken an unanticipated turn. In 
this context, it is important to note the distinction between encapsi- 
dation and packaging, two terms that are often used interchangeably 
in the coronavirus literature. Encapsidation is the process of formation 
of the nucleocapsid, that is, the cooperative binding of N protein to 
viral RNA. Packaging is the incorporation of the nucleocapsid into 
virions. For enveloped viruses, the two processes are not necessarily 


THE MOLECULAR BIOLOGY OF CORONAVIRUSES 


237 


the same. For example, for nonsegmented negative-strand viruses, 
both genomic and antigenomic RNA are encapsidated, but only geno¬ 
mic RNA is packaged. For coronaviruses, it was logical to assume that 
encapsidation is initiated by the N protein. Indeed, specific binding of 
MHV N protein to the packaging signal RNA has been demonstrated 
in vitro (Molenkamp and Spaan, 1997). However, in vitro RNA binding 
experiments have also shown a specific interaction between the 
MHV N protein and the leader RNA, which is located at the 5' end 
of subgenomic and genomic RNA (Nelson et al., 2000; Stohlman 
et al. , 1988). It remains to be seen whether either of these sequence- 
specific modes of RNA binding represents a nucleation step ultimately 
leading to encapsidation by multiple monomers of N. The binding of 
N to leader RNA appears incongruent with the specificity of packaging, 
but it is consistent with the observation that anti-N antibodies 
coimmunoprecipitate both subgenomic and genomic RNA from cells 
infected with MHV or BCoV (Baric et al., 1988; Cologna et al., 2000; 
Narayanan et al., 2000). A possible resolution of this paradox has come 
from findings that reveal a role for M protein in the selectivity of 
packaging. Antibodies to MHV M protein were shown to coimmunopre¬ 
cipitate the fraction of N protein that is bound to genomic RNA, but 
not N protein that is bound to subgenomic RNA (Narayanan et al., 

2000) . Furthermore, this specific M-N interaction is dependent on 
the presence of the MHV packaging signal (Narayanan and Makino, 

2001) . Remarkably, recent work with coexpressed MHV proteins has 
attributed the direct selection of packaging signal RNA to the M protein. 
Thus, VLPs formed by M and E proteins, but devoid of N protein, were 
found to incorporate a heterologous RNA molecule only if it contained the 
MHV packaging signal (Narayanan et al., 2003a). If this discovery turns 
out to generalize to all coronaviruses, then it will mean that M protein 
orchestrates every single interaction necessary for virion assembly. 


V. RNA Synthesis 
A. Replication and Transcription 

Coronavirus RNA synthesis proceeds by a complex and incompletely 
understood mechanism, portions of which involve interactions be¬ 
tween distant segments of the genome (Lai and Cavanagh, 1997; Lai 
and Holmes, 2001; van der Most and Spaan, 1995). Following its 
translation into the replicase polyproteins, the genomic RNA (gRNA) 
next acts as the template for synthesis of negative-sense RNA species. 
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Further events produce a series of smaller, subgenomic RNAs 
(sgRNAs) of both polarities (Fig. 6) (Baric and Yount, 2000; Sethna 
et al., 1989, 1991). The positive-sense sgRNAs, each of which serves as 
the message for one of the ORFs downstream of the replicase ORF, 
have compositions equivalent to large genomic deletions. Positive- 
sense sgRNAs contain a 70—100-nt leader RNA, which is identical to 
the 5' end of the genome, joined at a downstream site to a stretch of 
sequence (the body of the sgRNA), which is identical to the 3' end of 
the genome. Collectively, the sgRNAs are said to form a 3'-nested set. 
The 3'-nested set of sgRNAs, with or without a leader sequence, is a 
defining feature of the order Nidovirales (Enjuanes et al., 2000a; van 
Vliet et al., 2002). The negative-sense sgRNAs, roughly a tenth to a 
hundredth as abundant as their positive-sense counterparts, each 
possess the complement of this arrangement, including a 5' oligo(U) 
tract of 9-26 residues (Hofmann and Brian, 1991) and a 3' antileader 
(Sethna et al., 1991). 

Many advances in understanding the mechanism of coronavirus 
RNA synthesis were facilitated by the discovery and cloning of DI 
RNAs of MHV (Makino et al., 1985, 1988; van der Most et al., 1991) 
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Fig 6. Coronavirus RNA synthesis. The nested set of positive- and negative-strand 
RNAs produced during replication and transcription are shown, using MHV as an 
example. The inset shows details of the arrangement of leader and body copies of the 
transcription-regulating sequence (TRS). 
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and, subsequently, of other coronaviruses (Chang et al ., 1994; Mendez 
et al ., 1996; Penzes et al ., 1994). Because they are extensively deleted 
genomic variants that propagate by competing for the viral RNA syn¬ 
thesis machinery, DI RNAs have evolved to retain cis-acting sequence 
elements necessary for replication. Manipulations of naturally occur¬ 
ring and artificially constructed DI RNAs, which are studied by trans¬ 
fection into infected cells, enabled the mapping of elements from the 
genome that participate in replication and transcription (Brian and 
Baric, 2005). 

In studies of replication, deletion analyses of various cloned MHV DI 
RNAs have demonstrated that either 466, 474, or 859 nucleotides at 
the 5' end of the MHV genome are required to support replication (Kim 
et al., 1993; Lin and Lai, 1993; Luytjes et al., 1996). The exact magni¬ 
tude of this value appears to have been dependent on which MHV 
genomic regions were present in the individual DI RNA with which a 
particular analysis was begun. In the very closely related BCoV, 498 
nucleotides at the 5' end of a naturally occurring DI RNA have been 
shown to suffice for replication (Chang et al., 1994). For TGEV and LBV, 
the minimal 5' cis-acting replication signals have thus far been limited 
to 1348 and 544 nucleotides, respectively (Dalton et al., 2001; Izeta 
et al., 1999). In all cases, this region extends well beyond the leader 
RNA and includes a portion of the 5' end of the replicase ORF. This 
means that coronavirus sgRNAs do not have a sufficient extent of 
5' sequence to function as replicons, as was once proposed (Sethna 
et al ., 1989). Only in BCoV has the 5' cis-acting replication signal been 
further defined. Detailed dissections of this element, through structural 
probing and functional mutational analyses, have identified four stem- 
loop structures essential for RNA replication (Chang et al., 1994,1996; 
Raman and Brian, 2005; Raman et al., 2003). For stems III and IV, 
secondary structure, rather than primary sequence, has been shown 
to be of functional importance; these structures were found to be con¬ 
served in the more closely related group 2 coronaviruses but not in 
SARS-CoV. 

At the other end of the genome, deletion analyses found that the 
minimal stretch of the 3' terminus able to sustain MHV DI RNA 
replication falls between 436 and 462 nucleotides (Kim et al., 1993; 
Lin and Lai, 1993; van der Most et al., 1995). Notably, this range of 
sequence would include a portion of the adjacent N gene as well as the 
entire 301-nucleotide 3' UTR. By contrast, the minimal 3' cis-acting 
replication signals for TGEV and LBV were 492 and 338 nucleotides, 
respectively. DI RNAs containing such minimal elements were devoid 
of any part of the N gene (Dalton et al., 2001; Izeta et al., 1999). 
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Consistent with this latter finding, it was shown for engineered mu¬ 
tants of MHV that translocation of the N gene to an upstream genomic 
position had no effect on replication (Goebel et al., 2004a). This argues 
strongly that no essential 3' eis-acting region is present in the N gene 
within the intact MHV genome. If any such region does exist, it must 
be able to act at a distance of nearly 1.5 kb. Given the requirement in 
MHV for the entire 3' UTR, it was somewhat paradoxical when further 
study showed that a minimum of 45-55 nucleotides at the 3' end of 
the genome, plus an indeterminate amount of poly(A) tail, sufficed to 
support negative-strand RNA synthesis (Lin et al., 1994). From this 
result it was concluded that the promoter for negative-strand initia¬ 
tion lies completely within the last 55 nucleotides of the genome and 
that the remainder of the 3' cts-acting element must be required for 
positive-strand RNA synthesis. Alternatively, the 3'-most 45—55 
nucleotides of the genome may constitute the minimal region able to 
associate in trans with helper virus genome so as to allow initiation of 
negative-strand synthesis. A finer examination of the 3' poly(A) tail 
requirement found that, for both MHV and BCoV DI RNAs, no fewer 
than 5-10 A residues are necessary for replication, and there is a 
correlation between DI RNA replication competence and the ability 
to bind poly(A)-binding protein (Spagnolo and Hogue, 2000). 

Further investigation of the 3' UTR in MHV and BCoV has produced 
a fairly complete picture of the RNA landscape of this region. At the 
upstream end of the 3' UTR, two functionally essential structures have 
been demonstrated by chemical and enzymatic probing and by genetic 
studies with both DI RNAs and constructed viral mutants. The first 
structure is a bulged stem-loop (Hsue and Masters, 1997; Hsue et al., 
2000; Goebel et al., 2004a); the second is an adjacent RNA pseudoknot 
(Goebel et al., 2004a; Williams et al., 1999). An intriguing property of 
these upstream RNA elements is that they partially overlap, that is, 
the bulged stem-loop and the pseudoknot would not be able to fold up 
simultaneously. It has thus been proposed that they constitute compo¬ 
nents of a molecular switch that is operative at some stage of RNA 
synthesis, although a target of their putative regulation has not yet 
been identified (Goebel et al., 2004a). Further downstream in the MHV 
genome is a complex RNA secondary structural element that takes up 
most of the remainder of the 3' UTR (Johnson et al., 2005; Liu et al., 
2001). Although this structure is only poorly conserved with the struc¬ 
ture predicted for the corresponding region of the BCoV 3' UTR, muta¬ 
tions made in one stem that is highly conserved between the two 
viruses were found to be deleterious to DI RNA replication. Surpris¬ 
ingly, in the heart of this most divergent region of the 3' UTR is found 
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an octanucleotide motif, 5'-GGAAGAGC-3', that is absolutely conserved 
in the 3' UTRs of all coronaviruses in all three groups. 

The presence of the 3' UTR stem-loop and pseudoknot appears to 
be a distinguishing feature of the group 2 coronaviruses. The group 1 
coronaviruses all contain a highly conserved pseudoknot (Williams 
et al., 1999), but no detectable counterpart of the bulged stem-loop in 
either upstream or downstream proximity to it. On the other hand, the 
group 3 coronaviruses have a highly conserved and functionally essen¬ 
tial stem-loop (Dalton et al., 2001), but merely a poor candidate for the 
pseudoknot structure can be found nearby (Williams et al ., 1999). Only 
the group 2 coronaviruses have both elements, and, in all cases, the 
elements overlap in the same fashion. Despite sequence divergence 
among the 3' UTRs of group 2 coronaviruses, these genomic segments 
are functionally equivalent. The BCoV 3' UTR was found to be able to 
entirely replace the MHV 3' UTR (Hsue and Masters, 1997). Moreover, 
it was demonstrated that replication of a BCoV DI RNA could be 
supported by any of a number of closely related group 2 helper viruses, 
including MHV ( Wu et al ., 2003). More strikingly yet, the SARS-CoV 3' 
UTR was found to be able to entirely replace the MHV 3' UTR (Goebel 
et al ., 2004b). Thus, the replicase machinery of a group 2 coronavirus, 
MHV, is able to recognize and use the 3' cis-acting structures and 
sequences of other group 2 coronaviruses, BCoV and SARS-CoV. By 
contrast, the MHV 3' UTR cannot be replaced with either the group 1 
TGEV 3' UTR or the group 3 IBV 3' UTR. 

Numerous investigations have focused on the intriguing nature of 
coronavirus sgRNA transcription. The sites of leader-to-body fusion in 
the sgRNAs occur at loci in the genome that contain a short run of 
sequence that is identical, or nearly identical, to the 3' end of the leader 
RNA (Fig. 6). These sites are called transcription-regulating sequences 
(TRSs); they have also been designated transcription-associated se¬ 
quences (TASs) or intergenic sequences (IGs or IGSs). TRSs are fairly 
well conserved within each coronavirus group. The core consensus TRS 
is 5'-AACUAAAC-3' for group 1; 5'-AAUCUAAAC-3' for group 2 (except 
for SARS-CoV, for which it is 5'-AAACGAAC-3'); and 5'-CUUAACAA-3' 
for group 3 (Thiel et al., 2003a; van der Most and Spaan, 1995). 
Not every TRS in a given virus conforms exactly to the consensus 
sequence; a number of allowable variant bases are found in individual 
TRSs. 

It was clear from very early studies that the sgRNAs are formed by a 
discontinuous, cotranscriptional process and that they are not pro¬ 
duced by splicing of a full-length genomic precursor (Jacobs et al., 
1981; Stern and Sefton, 1982). As for RNA replication, the first 
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systematic means of addressing the mechanism of transcription came 
from the manipulation of engineered DI RNAs. The efficiency of fusion 
at a given TRS was at first thought to be mediated solely by base-pairing 
between the 3' end of the leader and the complement of the TRS. 
However, studies with DI RNAs containing authentic and mutated 
TRSs led many investigators to conclude that, beyond a minimum 
threshold of potential base pairing, other factors must predominate 
(Hiscox et al ., 1995; Makino et al ., 1991; van der Most et al ., 1994). DI 
RNA studies thus provided the first indication of the importance of the 
local sequence context of the TRS and the position of the TRS relative to 
the 3' end of the genome (Joo and Makino, 1995; Krishnan et al ., 1996; 
Ozdarendeli et al ., 2001; van Marie et al ., 1995). 

The original conceptual framework for many studies was that of 
leader-primed transcription. In this model, sgRNAs were envisioned 
to be generated during positive-strand RNA synthesis. It was proposed 
that the polymerase pauses near the end of the leader sequence and 
detaches with the nascent free leader RNA. This step is followed by 
reattachment of the leader RNA to the complement of a TRS at an 
internal portion of the negative-strand template, from where the na¬ 
scent RNA is then elongated (Lai, 1986). A refinement of this idea was 
that leader-to-body fusion results from quasi-continuous synthesis 
across two distant portions of a looped-out template, which are brought 
together via protein-RNA and protein-protein interactions (Lai et al., 
1994; Zhang et al ., 1994). 

More recently, accumulated experimental results, while retaining 
the notion of a looped-out template, have been taken to support a 
mechanism in which the discontinuous step in sgRNA synthesis occurs 
during negative-strand RNA synthesis (Fig. 7) (Sawicki and Sawicki, 
1998, 2005). In this model, the viral polymerase, starting from the 3' 
end of a genomic template, switches templates at an internal TRS and 
resumes synthesis at the homologous TRS sequence at the 3' end of 
the genomic leader RNA. The resulting negative-strand sgRNA, in 
association with positive-strand gRNA, then serves as the template 
for synthesis of multiple copies of the corresponding positive-strand 
sgRNA. This new view originated with the discovery of negative- 
strand sgRNAs (Sethna et al., 1989) and with the demonstration that 
free leader RNA could not be detected in infected cells (Chang et al., 
1994). Most (Baric and Yount, 2000; Sawicki and Sawicki, 1990; 
Sawicki et al., 2001; Schaad and Baric, 1994), although not all (An 
and Makino, 1998; An et al., 1998; Mizutani et al., 2000), subsequent 
biochemical work supported the contention that the negative-strand 
sgRNA species are kinetically competent to serve as templates for 



(-) 


Fig 7. Model for discontinuous negative-strand transcription. Negative-strand 
sgRNAs are initiated at the 3' end of the gRNA template. Elongation proceeds as far as 
a body copy of a transcription-regulating sequence (TRS). A strand-switching event then 
occurs, pairing the newly transcribed negative-sense body TRS with the leader copy 
of the TRS, from which point transcription resumes. A complex of the (+)gRNA and 
the (-)sgRNA then serves as the template for synthesis of multiple (+)sgRNAs. 











244 


PAUL S. MASTERS 


positive-strand sgRNAs. In addition, some of the strongest evidence for 
negative-strand discontinuous sgRNA synthesis came from landmark 
studies using a full-length infectious cDNA of equine arterivirus, the 
prototype member of the closely related arterivirus family. This work 
made use of a robust system in which both the leader copy and one or 
multiple body copies of the TRS were singly or simultaneously mutated 
in the genome; RNA synthesis in this system was able to be assayed in 
the initial passage of infectious RNA (Pasternak et al., 2001, 2003, 
2004; van Marie et al. , 1999). The arterivirus results have been corro¬ 
borated, in part, by experiments enabled by the development of reverse 
genetic approaches for TGEV and MHV (Alonso et al., 2002; Curtis 
et al., 2004; de Haan et al., 2002a,b; Sola et al., 2005; Zuniga et al., 
2004). At this time, there is a broad, but not universal, consensus that 
for coronaviruses, as well as for other nidoviruses, both replication and 
transcription initiate with negative-strand RNA synthesis. However, 
much further work needs to be done to elucidate the details of the 
template-switching step of discontinuous transcription. It will also be 
necessary to extend to the coronaviruses principles that have been 
more clearly established for the arteriviruses. 


B. RNA Recombination 

An important feature of coronavirus RNA synthesis is the high rate 
of homologous and nonhomologous RNA-RNA recombination that has 
been demonstrated to occur among selected and unselected markers 
during the course of infection. Although most experimental work in 
this area has been performed with MHV (Keck et al., 1987, 1988a,b; 
Makino et al., 1986,1987), a high frequency of homologous recombina¬ 
tion is clearly an attribute of the entire coronavirus family, given 
that it has been observed in other viruses in all three groups: TGEV 
(Sanchez et al., 1999), FIPV (Haijema et al., 2003; Herrewegh et al., 
1998), BCV (Chang et al., 1996), and LBV (Cavanagh et al., 1992; 
Kottier et al., 1995; Kusters et al., 1990; Wang et al., 1993). In addition, 
nonhomologous recombination was likely, in all three groups, to be the 
mechanism of acquisition of the various accessory protein genes. 

RNA recombination is thought to result from a copy-choice mecha¬ 
nism, as originally described for poliovirus (Kirkegaard and Baltimore, 
1986). In this scheme, the viral polymerase, with its nascent RNA 
strand intact, detaches from one template and resumes elongation at 
the identical position, or a similar position, on another template. In 
MHV, recombination has been shown to take place along the entire 
length of the genome at an estimated frequency of 1% per 1.3 kb 
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(almost 25% over the entire genome), the highest rate observed for any 
RNA virus (Baric et al., 1990). On a fine scale, the sites of recombina¬ 
tion were seen to be random (Banner and Lai, 1991), although strong 
selective pressures were able to create the appearance of local cluster¬ 
ing of recombinational hot spots in one study (Banner et al., 1990). 
Some results suggest that the rate of recombination increases across 
the entire MHV genome, from 5' to 3' end (Fu and Baric, 1992, 1994). 
This gradient may result from homologous recombination between 
genomic and subgenomic RNAs, since the latter would provide a source 
of donor and acceptor templates that would become more numerous as 
a function of proximity to the 3' end of the genome. 

Most evidence supports a model for viral RNA recombination having 
three mechanistic requirements (Lai, 1992). First, the RNA polymer¬ 
ase must pause during synthesis. This may be an intrinsic property of 
the enzyme, or it may result from the enzyme encountering a template 
secondary structure that exceeds a certain stability threshold. Second, 
a new template must be in physical proximity. Third, some property of 
the new template must allow the transfer of the nascent RNA strand 
and the resumption of RNA synthesis. Alternatively, strand transfer 
could result from a processive mechanism that does not require poly¬ 
merase dissociation (Jarvis and Kirkegaard, 1991). For poliovirus, 
classical experiments showed that RNA recombination occurs during 
negative-strand RNA synthesis (Kirkegaard and Baltimore, 1986), 
most likely because positive-strand acceptor templates far outnumber 
negative strands (Jarvis and Kirkegaard, 1992). The same is likely to 
be true for coronaviruses, since they, too, have a high ratio of positive- 
strand to negative-strand RNA (Sawicki and Sawicki, 1986, 1990; 
Sethna et al., 1989). Moreover, for MHV, most or all negative-strand 
RNA is found duplexed with positive-strand RNA (Lin et al., 1994; 
Sawicki and Sawicki, 1986). Thus, there may be a bias toward 
negative-strand recombination simply because positive-strand RNA 
is the most available (single-stranded) acceptor template. However, 
instances of coronavirus homologous recombination that occurred dur¬ 
ing positive-strand RNA synthesis have been documented (Liao and 
Lai, 1992). Also, work with extremely defective MHV mutants has 
shown that sufficiently strong selective pressures can reveal unusual 
nonhomologous rearrangements, including recombination between 
negative- and positive-strand RNA, which are likely to be constantly 
occurring at a low frequency during viral RNA synthesis. 

One form of nonhomologous recombination that occurs between 
genomic and subgenomic RNA has been hypothesized to result from 
the collapse of the transcription complex during negative-strand 
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discontinuous transcription (Kuo and Masters, 2002). Such a disrup¬ 
tion, followed by resumption of replicative antigenome synthesis, 
would leave a partial copy of the leader sequence embedded at an 
internal point in the genome, near the junction between two genes. 
This type of recombinant was selected repeatedly in revertants of a 
severely impaired MHV M protein mutant. However, similar tran¬ 
scriptional collapse events may have been a significant factor in coro- 
navirus evolution. Remnants of leader RNAs were found in the 
genomes of wild-type HCoV-OC43 (Mounir and Talbot, 1993) and in a 
mutant of MHV strain S (Taguchi et al. , 1994). Most strikingly, the 
recently described HCoV-HKUl genome contains two very significant 
segments of embedded leader sequence (Woo et al. , 2005). Each of these 
leader remnants occurs at a site where there is an apparent deletion of 
an entire accessory gene, with respect to the genomic layouts of the 
closest relatives of this virus, MHV and BCoV. 


C. Replicase Complex 
1. Ribosomal Frameshifting 

The replicase complex that carries out the intricacies of viral RNA 
replication and transcription is encoded by the first gene of the corona- 
virus genome. This huge gene occupies roughly two-thirds of the 
genome and contains two ORFs, the complete expression of which is 
dependent on a programmed ribosomal frameshift. The discovery of 
coronavirus ribosomal frameshifting resulted from the completion 
of the sequence of IBV, the first member of the family for which an 
entire genomic sequence was obtained (Brierley et al., 1987). This 
revealed a small (43 nt) overlap between ORF la (11.9 kb) and ORF 
lb (8.1 kb), the latter in the -1 frame relative to the former; moreover, 
there was no sgRNA that could serve as the mRNA for ORF lb. This 
arrangement was subsequently found to exist for all coronaviruses. 
Thus, ribosomal frameshifting, which had previously been seen only in 
retroviruses (Jacks et al., 1988), was proposed as a mechanism for 
expression of ORF lb. Programmed frameshifting was demonstrated 
for the IBV gene la/lb overlap region in reporter gene constructs in 
experiments using in vitro translation systems and, in some cases, 
cellular expression systems (Brierley et al., 1989). In such systems, a 
frameshifting incidence of 25—30% was measured, representing an 
efficiency far greater than the 5% seen at the retroviral gag-pol junc¬ 
tion. It should be noted, however, that the efficiency of in vivo frame- 
shifting occurring in cells infected with IBV, or any other coronavirus, 
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has not yet been quantitated; nor is it known whether that value 
remains constant over the course of infection. 

IBV ribosomal frameshifting was found to depend on two genomic 
RNA elements (Fig. 8): a heptanucleotide “slippery sequence” 
(UUUAAAC) and a downstream, hairpin-type pseudoknot (Brierley 
et al., 1989). In addition, the spacing between these elements is 
critical. It is thought that the pseudoknot impedes the progress of 
the elongating ribosome. With some fixed probability, the delay re¬ 
quired for the ribosome to melt out this secondary structural element 
allows the simultaneous slippage of the P and A site tRNAs by one base 
in the —1 direction. Normal translational elongation then resumes. 
Studies of the kinetics of translation, using a model mRNA based on 
the IBV frameshifting region, support the idea of ribosomal pausing at 
the pseudoknot (Somogyi et al ., 1993). Moreover, mutational studies of 
IBV frameshifting (Brierley et al ., 1989) and direct mass spectrometric 
analysis of the SARS-CoV frameshifted polypeptide product (Baranov 
et al ., 2005) have confirmed both the locus of the slippage site and the 
occurrence of simultaneous slippage. The reason why coronaviruses 
employ ribosomal frameshifting as a gene expression strategy is less 
well established at this time. The explanation most commonly given is 
that, as for retroviruses, the frameshifting mechanism provides a 
fixed ratio of translation products, in the necessary proximity of one 
another, for assembly into a macromolecular complex. It could also be 
speculated that frameshifting forestalls expression of the enzymatic 



Fig 8. RNA elements required for ribosomal frameshifting. The expanded region 
shows RNA sequences and secondary structures that program the frameshift, using 
IBV as an example. 
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products of ORF lb until a platform and a cellular environment for 
them have been prepared by the products of ORF la. 

The two genomic components required for ribosomal frameshifting 
have been investigated in considerable detail. Exhaustive mutagenesis 
of the slippery sequence showed that frameshifting could be facilitated 
by a number of heptameric sequences of the form XXXYYYN, where 
XXX and YYY are the postslippage P and A site codons, respectively 
(Brierley et al., 1992). Hierarchies of preferred combinations of X, Y, 
and N were defined, and these indicated a major role for the strength of 
the A-site tRNA interaction. However, although some heptanucleo- 
tides showed a frameshifting efficiency nearly as high as that of the 
wild type, it must be noted that, to date, all known coronaviruses have 
been found to contain a slippery sequence of UUUAAAC (Brian and 
Baric, 2005; Plant et al ., 2005). 

The second component, the pseudoknot, has similarly been exam¬ 
ined through exhaustive mutagenesis (Brierley et al., 1991). Although 
the involvement of a downstream RNA secondary structural element 
in ribosomal frameshifting was first recognized with retroviruses 
(Jacks et al ., 1988), the earliest demonstration that the requisite struc¬ 
ture is a pseudoknot came from the study of IBV (Brierley et al ., 1989). 
This demonstration was initially by classic stem replacement muta¬ 
genesis, and, subsequently, by intensive modification of pseudoknot 
elements; all of the results of both types of studies supported the 
proposed structure. It was also revealed that the length of stem 1 is 
very important for frameshifting efficiency (Napthine et al ., 1999) and 
that it is the structure, not the primary sequence, that is significant for 
both stems 1 and 2. Higher-order structure was also found to be 
critical: the pseudoknot could not be replaced by a single stem-loop of 
the same stability, containing the identical base pairs as the sum of the 
two pseudoknot stems (Brierley et al., 1991). 

The frameshifting signals of other coronaviruses have been found to 
generally conform to the rules defined for IBV, although additional 
complexities have emerged. With the completion of the genomic se¬ 
quences of the group 1 coronaviruses HCoV-229E (Herold and Siddell, 
1993) and TGEV (Eleouet et al., 1995), an “elaborated” pseudoknot was 
proposed for members of this group, containing a third stem falling 
within an unusually large loop 2. It is currently unresolved whether 
the group 1 elaborated pseudoknot is the operative structure in frame- 
shifting, as suggested by some mutational evidence (Herold and 
Siddell, 1993). By contrast, loop 2 can be assigned as for the other 
coronaviruses, with the extra group 1-specific element providing an 
alternative, long-range kissing loop interaction between the upstream 
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arm of pseudoknot stem 2 and the loop of a downstream stem-loop 
(Plant et al., 2005). Analysis of the sequence of the frameshifting 
region of the SARS-CoV genome led to the prediction of a third stem- 
loop within loop 2 of the pseudoknot (Ramos et al., 2004). This third 
element is situated differently from the additional stem of the group 1 
elaborated pseudoknot, but it is similar to the potential bulged stem- 
loop that was earlier proposed to reside in loop 2 of the pseudoknot of 
the torovirus Berne virus (Snijder et al., 1990). Further computational 
analysis has similarly found a possible third stem within loop 2 of the 
frameshifting pseudoknots of all coronaviruses, and the SARS-CoV 
stem 3 structure has been shown to be consistent with NMR data 
and nuclease mapping (Plant et al., 2005). The role of stem 3 in 
ribosomal frameshifting is, as yet, unclear. Contrary to the previous 
results in the IBV system, mutagenesis studies suggest that both the 
primary sequence and the structures of the SARS-CoV stems 2 and 3 
affect the efficiency of frameshifting (Baranov et al. , 2005; Plant et al., 
2005). On the other hand, the complete deletion of stem 3 is not 
detrimental to frameshifting. This seeming discrepancy has led to 
the suggestion that stem 3 plays an as yet undiscovered regulatory 
role, perhaps in the switch from genome translation to replication 
(Plant et al., 2005). 

2. Replicase Proteins 

The end result of the ribosomal frameshifting-mediated translation 
of the replicase gene is the synthesis of two very large polyproteins, 
ppla and pplab. These range from 440 to 500 kDa and from 740 to 810 
kDa, respectively, and they are cotranslationally processed by two or 
three internally contained proteinase activities. The Herculean task of 
mapping all of the polyprotein processing events began at a time before 
investigators were even aware of the full sizes of coronavirus genomes 
(Denison and Perlman, 1986, 1987; Soe et al., 1987). Only relatively 
recently have replicase cleavage maps been completed for at least 
one representative from each coronavirus group (Bonilla et al., 1997; 
Kanjanahaluethai et al., 2003; Lim and Liu, 1998; Liu et al., 1998; Lu 
and Denison, 1997; Pinon et al., 1997; Schiller et al., 1998; Xu et al., 
2001; Ziebuhr and Siddell, 1999; Ziebuhr et al., 2001). Knowledge 
gained from these efforts allowed the informed prediction (Snijder 
et al., 2003; Thiel et al., 2003a) and rapid experimental verification 
(Harcourt et al., 2004; Prentice et al., 2004b) of the processing pathway 
for the SARS-CoV replicase. 

The final products of the autoproteolytic cleavage of ppla and 
pplab are 16 nonstructural proteins, designated nspl-nspl6 (Fig. 9). 
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Fig 9. Protein products of the replicase gene. Cleavage sites and processed products of 
ppla (nspl-nspll) and of pplab (nspl-nsplO, nspl2-nspl6) are shown. Predicted and/or 
experimentally demonstrated activities are indicated. 


Nspl—nspll are derived from ppla, whereas nspl—nsplO and 
nspl2-nspl6 are derived from pplab. Thus, all products processed 
from ppla are common to those processed from pplab, except for 
nspll, which is an oligopeptide generated when ribosomal frameshift- 
ing does not occur. For IBV, which lacks a counterpart of nspl, there 
are 15 final products of polyprotein cleavage. These are numbered 
beginning with nsp2, in order to maintain correspondence with their 
homologs in the other coronaviruses. Comparative layouts and proces¬ 
sing schemes for the replicase genes of all three coronavirus groups 
can be found in the review by Ziebuhr (2005) and references therein. 
Detailed lists and schematics of cleavage sites, the proteinases respon¬ 
sible, and the resulting nsp products for HCoV-229E, MHV, and IBV can 
be found in Table 2 and Figure 2 of the review by Ziebuhr et al. (2000). It 
should be noted that partial proteolytic products may also be significant 
in the processing scheme. The efficiency of cleavage at particular poly¬ 
protein sites may be regulated by both the exact primary sequence 
at the site and the site’s accessibility to the proteinase (Ziebuhr, 2005; 
Ziebuhr et al., 2000). 

Elucidation of the precise roles of nspl-nspl6 will be the next major 
undertaking. Functions for many domains of the coronavirus replicase 
were predicted by pioneering bioinformatics methods well before the 
term “bioinformatics” was invented (Gorbalenya et al ., 1989; Lee et al ., 
1991). While knowledge about many of the replicase proteins is still at 
a very early stage, substantial progress has been made for others. 
Research in this field is proceeding at an unprecedented pace for 
reasons of both opportunity and necessity. First, tools that were not 
previously available, most notably reverse genetics systems for the 
replicase gene, are now at the disposal of coronavirus researchers. 
Second, the replicase products present a wide array of promising 
targets for anti-SARS therapeutics. The information that is currently 
at hand points to a correspondence between the genomic order of the 
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encoded activities of the replicase gene and the temporal program of 
infection. The products of ppla appear to function to prepare the cell 
for infection and to assemble the machinery for RNA synthesis. Then, 
the products that are unique to pplab carry out the actual catalysis of 
RNA replication and transcription. 

The very first mature translation product for MHV ppla, nspl, has 
been shown to play a role in cell cycle arrest. It may thus prepare a 
favorable cellular environment for viral replication (Chen and Makino, 
2004; Chen et al., 2004). The next cleavage product, nsp2, diverges 
considerably among different coronaviruses, and no function for it has 
yet been predicted or demonstrated. Surprisingly, deletion of the com¬ 
plete nsp2 region from the genome of MHV or SARS-CoV was not 
lethal. However, nsp2 deletion mutants showed delayed viral growth 
kinetics (Graham et al ., 2005). Other early replicase products are the 
enzymes that carry out the processing of the polyproteins: papain-like 
proteinases, which are in nsp3 (Baker et al., 1993), and the main 
proteinase, which is in nsp5 (Lu et al ., 1995). Most coronaviruses have 
two papain-like proteinases, designated PLl pro and PL2 pro . By con¬ 
trast, IBV and SARS-CoV have a single PL pro . PLl pro and PL2 pro may 
have arisen by duplication, and in vitro, they appear to have some 
redundancy in their activities. However, for HCoV-229E, a genetic 
analysis showed that PL2 pro is essential, and the presence of both 
PLl pro and PL2 pro was found to confer a clear advantage in viral fitness 
(Thiel and Siddell, 2005). In addition to the papain-like proteinases, 
nsp3 in many coronaviruses contains a domain that harbors ADP- 
ribose-l”-monophosphatase activity (Putics et al., 2005). The construc¬ 
tion of active-site mutants has shown that this activity is dispensable 
for replication of HCoV-229E in tissue culture. Although the cellular 
homolog of this enzyme plays a role in tRNA processing, the biological 
significance of the virally encoded activity is unknown. Nsp3 can also 
contain some variable domains. In HCoV-HKUl, as many as 14 tan¬ 
dem repeats of an acidic decapeptide are present in an amino-terminal 
segment of nsp3 (Woo et al., 2005 [note: nsp3 is misidentified as nspl in 
this reference]). In SARS-CoV, nsp3 contains a “SARS-unique” domain 
that is not found in any other corona virus (Snijder et al., 2003). 

The coronavirus main proteinase, designated M pro , constitutes all of 
nsp5. This enzyme has also been called the 3C-like proteinase (3CL pro ), 
because of its resemblance to the 3C proteinases of picornaviruses. 
Crystal structures have been solved for M pro for HCoV-229E (Anand 
et al., 2002), TGEV (Anand et al., 2003), and SARS-CoV (Yang et al., 
2003). These reveal that M pro is a dimer, each monomer of which has a 
three-domain structure, with an active site located in a cleft between 
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the first and second domains in each monomer. At the carboxy termi¬ 
nus is an extra domain not found in the 3CL pr0 of other viral families. 
Multiple structures determined for the SARS-CoV M pro showed that 
the entire molecule undergoes major pH-dependent conformational 
changes, which have been proposed to regulate activity. 

At the carboxy-terminal end of ppla is a cluster of small proteins, 
nsp7—nsplO. The crystal structure of SARS-CoV nsp9 was solved inde¬ 
pendently by two groups (Egloff et al., 2004; Sutton et al., 2004). In 
addition, prompted by features of the structure, investigators found 
that nsp9 has nonspecific RNA-binding activity. Biophysical evidence 
has also been presented for an interaction between nsp9 and nsp8 
(Sutton et al., 2004). Therefore, although nsp9 was found to occur as 
a dimer in the crystals, its natural binding partner may be nsp8. A 
solution structure for SARS-CoV nsp7 was determined by NMR; this 
structure showed potential protein-protein interaction surfaces for 
this small polypeptide (Peti et al., 2005). Moreover, a cocrystal struc¬ 
ture of SARS-CoV nsp7 with nsp8 revealed a complex of eight mono¬ 
mers of each protein forming a hollow cylindrical structure. This 
hexadecameric assembly was proposed to be able to encircle an RNA 
template, possibly acting as a processivity factor for the RNA polymer¬ 
ase (Zhai et al., 2005). Thus, a picture of a putative complex of all four 
of the nsp7-nspl0 polypeptides is being gradually pieced together, but, 
as yet, there is a paucity of functional data to complement this wealth 
of structural information. 

Transmembrane domains in nsp3, nsp4, and nsp6 anchor the repli- 
case complex to intracellular membranes, and these proteins may be 
involved in the remodeling of the latter, to form double-membrane 
compartments that are dedicated to viral RNA synthesis (Bi et al., 
1999; Gosert et al., 2002; Prentice et al., 2004a; Shi et al., 1999; van 
der Meer et al ., 1999). These double-membrane vesicles, which coloca¬ 
lize with nascent viral RNA, are distinct from the sites of virion 
assembly and budding. Coronavirus RNA synthesis may thus take 
place in structures that are similar to the autophagosomal RNA syn¬ 
thesis compartments that have been characterized in picomavirus- 
infected cells (Jackson et al., 2005). The nsp7-nspl0 products localize 
in discrete perinuclear and cytoplasmic foci in infected cells (Bost et al., 
2000), in a membrane-associated complex that also includes nsp2. This 
complex colocalizes with N protein and the viral helicase (nspl3) early 
in infection. However, late in infection, N protein and the helicase 
segregate into biochemically distinct membranes in the ERGIC that 
also contain M protein, suggesting a role for the helicase in genome 
encapsidation or packaging (Bost et al., 2001; Sims et al., 2000). 
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The postribosomal frameshift products of the replicase, nspl2- 
nspl6, contain the actual enzymes of RNA replication and transcrip¬ 
tion. The coronavirus RNA-dependent RNA polymerase (RdRp) is 
contained within nspl2, the first part of pplab synthesized after fra- 
meshifting. This protein has the fingers, palm, and thumb domains 
common to a number of viral RdRps and reverse transcriptases. In 
addition, the RdRp contains a very large, amino-terminal domain that 
is unique to the coronaviruses. For MHV, the ability of the RdRp to 
associate with intracellular membranes was mapped to a 38-amino 
acid segment of the unique domain (Brockway et al ., 2003). Membrane 
association of expressed RdRp also depended on MHV infection, indi¬ 
cating that other viral components are required for this targeting. In 
addition, the RdRp was shown to form intermolecular associations 
with M pro , nsp8, and nsp9. For the SARS-CoV RdRp, preliminary 
biochemical characterization of the bacterially expressed enzyme sug¬ 
gests that the coronavirus-unique domain is essential for activity 
(Cheng et al., 2005). 

Nspl3 contains multiple activities that have been extensively char¬ 
acterized for HCoV-229E and SARS-CoV (Ivanov and Ziebuhr, 2004; 
Ivanov et al., 2004a; Seybert et al., 2000). This protein is a helicase 
with a highly processive duplex unwinding activity for both DNA and 
RNA substrates. The nspl3 helicase unwinds with 5'-3' polarity, sug¬ 
gesting that it has a role in preparing the template for the RdRp. 
Nspl3 also has RNA-dependent NTPase and dNTPase activities, 
which probably provide the energy for its translocation along RNA 
templates. In addition, nspl3 is a RNA 5'-triphosphatase, making it 
a candidate to carry out the initial step of RNA capping. 

Nspl4 and nspl5 have each been assigned ribonucleolytic functions. 
Such activities would, at first glance, seem to be out of place in an RNA 
virus. Nspl4 has been predicted to be an exonuclease (designated 
ExoN), which, it is speculated, could be involved in an RNA processing 
step integral to coronavirus transcription (Snijder et al., 2003). 
This activity has not yet been demonstrated, but a point mutation in 
nspl4 of MHV was shown to be markedly attenuating in the mouse 
host (Sperry et al., 2005). Nspl5 is an endoribonuclease, designated 
NendoU, that is found only in the nidoviruses (Snijder et al., 2003). 
This enzyme, from HCoV-229E and SARS-CoV, has been shown to 
hydrolyze both single- and double-stranded RNA, with a specificity 
for cleavage immediately upstream and downstream of uridylate resi¬ 
dues (Bhardwaj et al., 2004; Ivanov et al., 2004b). NendoU exhibited 
optimal activity with manganese ion, rather than magnesium ion, 
and it was essentially inactive with 2'-0-ribose-methylated RNA 
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substrates (Ivanov et al., 2004b). Mutation of the active site of nspl5 of 
HCoV-229E was found to be lethal. 

Finally, nspl6, the carboxy-terminal product of pplab, has been 
predicted to contain 2'-0-methyltransferase activity (Snijder et al., 
2003; von Grotthuss et al., 2003 [note: nspl6 is misidentified as nspl3 
in this reference]). Such an activity, which has not yet been demon¬ 
strated, would have a most obvious role in RNA capping. However, the 
possibility has been raised that 2'-0-methylation serves to protect a 
segment of duplex RNA from the NendoU activity of nspl5 in one stage 
of discontinuous negative-strand RNA synthesis (Ivanov et al., 2004b). 
Relevant to RNA capping, it must be noted that if coronaviruses 
possess their own guanylyltransferase or cap 7-methyltranferase activ¬ 
ities, these have not yet been identified among the many replicase 
proteins. 

3. Host Factors 

RNA viruses often expropriate and redirect host cell components, to 
assist in mechanisms of their own gene expression (Ahlquist et al., 
2003). A number of host factors have been proposed to participate in 
coronavirus RNA synthesis. To date, all of these have been discovered 
with either MHV or BCoV, and all were originally identified on the 
basis of their ability to bind in vitro to RNA segments of functional 
importance. The most completely characterized coronavirus host factor 
is heterogeneous nuclear ribonucleoprotein Al (hnRNP Al), which was 
initially found as a member of a set of proteins that bound to the 
negative strand of the MHV TRS (Furuya and Lai, 1993; Li et al., 
1997; Zhang and Lai, 1995). Its RNA-binding property, its affinity for 
MHV N protein, and its propensity to dimerize, all made hnRNP Al 
attractive as a potential mediator of the antigenome looping-out event 
envisaged by the leader-primed transcription model (Wang and Zhang, 
1999; Zhang and Lai, 1995; Zhang et al., 1999). Overexpression of 
hnRNP Al was shown to result in a marked increase in the kinetics 
of MHV RNA synthesis, suggesting that this factor affects genome 
replication as well as transcription. Additionally, expression of a 
truncated form of hnRNP Al had a dominant-negative effect on MHV 
replication (Shi et al., 2000). The role of hnRNP Al was questioned on 
the basis of the finding that MHV replication and RNA synthesis were 
completely unimpaired in CB3 cells, a mutant murine cell fine that 
does not express hnRNP Al (Ben-David et al., 1992; Shen and Masters, 
2001). In addition, high-affinity hnRNP Al binding sites (Burd and 
Dreyfuss, 1994), when placed in the MHV genome, did not act in lieu of 
a TRS and did not displace the site of leader-body fusion away from a 
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TRS (Shen and Masters, 2001). However, it was subsequently shown 
that other hnRNP A/B family members, which are present in CB3 cells, 
could replace hnRNP Al; further, overexpression of hnRNP A/B was 
shown to enhance MHV RNA synthesis (Shi et al., 2003). 

Other members of the hnRNP family have also been implicated in 
MHV RNA synthesis. Pyrimidine tract-binding protein (PTB, also 
known as hnRNP I) was shown to bind to pentanucleotide repeats 
upstream of the positive-strand leader copy of the TRS (Li et al., 
1999). In addition, PTB bound the negative strand of the 3' UTR, 
specifically at the complement of the invariant octanucleotide motif 
(Huang and Lai, 1999). The positive strand of the same region of the 
3' UTR was also bound by hnRNP Al, and deletions in this region 
inhibited DI RNA synthesis (Huang and Lai, 2001). Another hnRNP, 
synaptotagmin-binding cytoplasmic RNA-interacting protein (SYN- 
CRIP), was found to bind to both positive- and negative-strand MHV 
RNA near the region of the leader pentanucleotide repeats (Choi et al ., 
2004). Moreover, RNAi-mediated downregulation of SYNCRIP delayed 
the kinetics of MHV RNA synthesis. In the BCoV 5' UTR, multiple 
complexes of six proteins have been found to bind specifically to the 
stem-loop IV that is required for DI RNA replication (Raman and 
Brian, 2005). It is not yet clear whether some of these proteins are 
previously identified hnRNPs or whether they represent new cellular 
factors. 

In the 3' UTR of MHV, a complex of proteins was found to bind to two 
similar 11-base motifs in positive-strand RNA, at distances of 26-36 
and 129—139 nucleotides from the poly(A) tail (Liu et al., 1997; Yu and 
Leibowitz, 1995a, b). DI RNAs with mutations in either of these ele¬ 
ments were defective in replication. The largest member of the protein 
complex was identified as mitochondrial aconitase, a protein not pre¬ 
viously known to have RNA-binding activity (Nanda and Leibowitz, 
2001). Other components of the complex were then found to be the 
chaperones HSP60, HSP40, and mitochondrial HSP70 (Nanda et al., 
2004). Although MHV replication does not have any known involve¬ 
ment with mitochondria, both mitochondrial aconitase and mitochon¬ 
drial HSP70 have substantial cytoplasmic fractions. Finally, at the 
furthest downstream ends of the genomes of MHV and BCoV, poly(A) 
binding protein binds to the poly(A) tail and appears to play a role 
in RNA synthesis beyond its function in translation (Spagnolo and 
Hogue, 2000). 

Among the array of candidate host factors in coronavirus RNA syn¬ 
thesis, it remains to be established which are essential and which play 
enhancing roles, either as RNA chaperones or in some other capacity. 
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Such assessments can be difficult, because many of these factors are 
critical or essential to normal cellular functions. Thus, the validation of 
host factors will likely require the establishment of an efficient in vitro 
RNA replication and transcription system, in which reconstitution of 
coronavirus RNA synthesis can be achieved from isolated components 
and precursors. 


VI. Genetics and Reverse Genetics 

Numerous classical coronavirus mutants have been isolated over the 
past 25 years, mainly with MHV (Lai and Cavanagh, 1997). Mutants 
were either identified as naturally occurring viral variants (often on 
the basis of causing atypical pathogenesis), or else they were obtained 
through selection criteria such as escape from neutralization by mono¬ 
clonal antibodies. A number of sets of MHV mutants were generated by 
chemical mutagenesis, followed by screening for temperature-sensitive 
phenotypes (Koolen et al., 1983; Martin et al., 1988; Robb et al., 1979; 
Schaad et al ., 1990) or, in one case, for aberrant cytopathic effects or 
plaque morphologies (Sturman et al., 1987). Although the latter search 
yielded an unusually high proportion of structural protein mutants, 
viruses with conditionally lethal, RNA-negative phenotypes were the 
predominant isolates in all searches. The arrangement of the corona¬ 
virus genome dictates that the vast majority of randomly generated 
mutations will fall in the replicase gene, owing to its large target size. 
Despite assiduous efforts that applied classical genetic methods to the 
study of the replicase (Baric et al., 1990; Fu and Baric, 1992, 1994; 
Schaad et al ., 1990), progress was limited by the technology available 
at the time, and exploitation of the full value of these mutants would 
await the development of reverse genetic techniques. 

The basic blueprint for positive-strand RNA virus reverse genetics— 
the transcription of infectious RNA from a full-length cDNA copy of the 
viral genome—was established more than two decades ago with polio¬ 
virus (Racaniello and Baltimore, 1981). It became possible only recently 
to apply this scheme to coronaviruses, however, owing to the need to 
surmount a number of formidable hurdles. Most notable were the ob¬ 
stacles posed by the huge sizes of coronavirus genomes and the high 
instabilities of various regions of the replicase gene when they were 
propagated as cloned cDNA in E. coli. The first reverse genetic sys¬ 
tem for coronaviruses, targeted RNA recombination, was developed to 
circumvent these barriers, at a time when it was far from clear whether 
the construction of full-length infectious cDNA clones would ever be 
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technically feasible (Masters, 1999; Masters and Rottier, 2005). This 
method, originally developed in MHV, takes advantage of the high rate 
of homologous RNA recombination in coronaviruses. A synthetic donor 
RNA bearing the mutation of interest is introduced into cells that have 
been infected with a recipient parent virus possessing some character¬ 
istic that can be selected against. Mutant recombinants that arise 
among progeny viruses are then identified by counterselection of the 
recipient parent virus. 

The earliest form of targeted RNA recombination employed, as the 
recipient parent virus, a classical MHV mutant that was thermolabile 
owing to an internal deletion in the N gene (Koetzner et al., 1992; 
Peng et al., 1995a), which is the 3'-most gene in the genome. Mutations 
were introduced into the N gene or the 3' UTR by means of in vitro- 
synthesized donor RNAs corresponding to the smallest MHV sgRNA. 
Recombinants, which were identified as survivors of a heat-killing 
selection, had restored the region deleted in the parent virus and, 
concomitantly, had acquired marker mutations planted in the donor 
RNA. The efficiency of this system was subsequently increased by the 
incorporation of 5'-cts-acting elements that converted the donor RNA 
into a replicating DI RNA (Masters et al ., 1994; van der Most et al., 
1992). The scope of this technique was then extended through the 
addition of 3'-contiguous genomic sequence to donor RNAs, ultimately 
allowing reverse-genetic access to all of the structural genes of MHV 
(Fischer et al., 1997a,b, 1998; Peng et al., 1995b). The strength and 
versatility of targeted RNA recombination were substantially enhanced 
as a result of the construction of the interspecies coronavirus mutant 
fMHV, a chimera in which the S protein ectodomain of MHV was re¬ 
placed by the S protein ectodomain from FIPV (Kuo et al., 2000). This 
replacement resulted in a virus that had acquired the ability to grow in 
feline cells and had simultaneously lost the ability to grow in murine 
cells. Although the immediate rationale for the creation of fMHV was 
to dissect domain requirements for virion assembly (Section IV.B.2), it 
was readily apparent that this chimera offered a tremendous selective 
advantage in targeted RNA recombination. The use of fMHV as the 
recipient parent virus allowed the selection of recombinants har¬ 
boring virtually any nonlethal MHV mutation in the 3'-most 10 kb of 
the genome, on the basis of their having regained the ability to grow 
in murine cells. Numerous mutants, many with extremely fragile 
phenotypes, have since been obtained by this method (de Haan et al., 
2002a,b; Goebel et al., 2004a,b; Hurst et al., 2005; Kuo et al., 2002, 
2003). The generality of this host-range-based selection system has 
been established by the extension of the method to another strain of 
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MHV (Ontiveros et al., 2001) and by use of an analogous chimera, 
mFIPV, for the construction of FIPV mutants (Haijema et al., 2003, 
2004). 

Despite its value, however, targeted RNA recombination can be used 
to engineer only the downstream one-third of the genome. The com¬ 
plete extent of reverse genetics did not become available to coronavirus 
research until relatively recently. Through the exceptional persever¬ 
ance and inventiveness of three independent laboratories, systems 
based on full-length cDNA clones have been developed, each using 
a different strategy to overcome the stability problems inherent to 
coronavirus cDNA. These systems all provide a capability of great 
importance that is effectively beyond the scope of targeted RNA recom¬ 
bination: access to the replicase gene. In the first such method 
(Enjuanes et al ., 2005), a full-length cDNA copy of the TGEV genome 
was assembled in a low copy-number bacterial artificial chromosome 
(BAC) vector. Infectious coronavirus RNA was produced in this system 
by a “DNA-launch,” in vivo nuclear transcription by host RNA poly¬ 
merase II from an engineered CMV promoter (Almazan et al ., 2000). 
The DNA launch ensured complete capping of the viral RNA, and it 
bypassed potential limitations of the system arising from the efficiency 
of in vitro transcription of genomic RNA. Heterologous sequence was 
removed from the 3' end of the transcribed RNA through the action of 
an incorporated hepatitis delta virus ribozyme. Further stabilization 
of the full-length BAC clone in bacteria was achieved through the 
insertion of a eukaryotic intron into either of two positions in 
the mapped toxic region of the TGEV cDNA (Gonzalez et al., 2002). 
This allowed stable propagation of the BAC for over 200 bacterial 
generations. 

In the second method, full-length genomic cDNAs were assembled 
by in vitro ligation of smaller, more stable subcloned cDNAs (Baric and 
Sims, 2005). Infectious RNA was then transcribed in vitro from the 
ligated product. The boundaries of the subcloned genomic cDNA frag¬ 
ments were chosen so as to allow ease of manipulation for site-directed 
mutagenesis applications. Most importantly, some fragment bound¬ 
aries were arranged in such a way as to interrupt regions of cloned 
cDNA instability. This is essentially the same scheme that had been 
earlier used to produce infectious RNA for yellow fever virus, a flavivi- 
rus (Rice et al., 1989). However, for corona viruses, the scheme had to 
be executed on a much grander scale, with five to seven fragments 
instead of two. To facilitate this approach, the innovation was intro¬ 
duced of directing the unique assembly of fragments by means of 
nonsymmetric overhangs generated by restriction enzymes that cut 
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at a distance from their recognition sequences. This ensured that the 
fragments became connected in a predetermined order by ligation, 
without the generation of rearranged byproducts. Originally demon¬ 
strated with TGEV (Yount et al., 2000), this in vitro assembly tech¬ 
nique has subsequently been successfully used to engineer the 
genomes of MHV (Yount et al., 2002), SARS-CoV (Yount et al., 2003), 
and LBV (Youn et al., 2005). 

In the third method, entire coronavirus cDNAs, generated by long- 
range RT-PCR (Thiel et al., 1997), were inserted into a unique restric¬ 
tion site in the genome of vaccinia virus (Thiel and Siddell, 2005). In 
this scheme, vaccinia virus served as a huge cloning vehicle, in 
which the coronavirus genome cDNAs did not exhibit the instabilities 
encountered in E. coli plasmids. Infectious RNA was produced by 
in vitro transcription from purified vaccinia virus DNA (Thiel et al., 
2001a). Alternatively, a DNA launch was carried out in vivo with 
transfected cDNA and fowlpox-encoded T7 RNA polymerase (Casais 
et al., 2001). The use of vaccinia as a vector has allowed manipulation 
of the resulting cloned cDNA by any among the suite of methods that 
have been developed for poxvirus reverse genetics. In particular, tran¬ 
sient dominant selection has been used to carry out site-directed mu¬ 
tagenesis (Britton et al., 2005). Engineered mutations have also been 
directly recombined from PCR products into vaccinia clones, through 
exploitation of both negative and positive selection of a gpt cassette 
(Coley et al., 2005). A further innovation came from the rescue of 
recombinant coronaviruses from cell lines expressing N protein, given 
that N protein has been shown to greatly enhance recovery of virus in 
all three full-length cDNA systems (Almazan et al., 2004; Schelle et al., 
2005; Thiel et al., 2001a; Yount et al., 2002). This poxvirus-vectored 
technique was originally applied to HCoV-229E (Thiel et al., 2001a), 
and it has since been used to engineer the genomes of IBV (Casais 
et al., 2001) and MHV (Coley et al., 2005). 

The two main options for reverse genetic systems both have their 
own relative advantages. For reverse genetic studies involving corona¬ 
virus structural genes or the 3' UTR, targeted RNA recombination is 
currently the easier system to manipulate, and it has the power to 
recover extremely defective mutants. Another asset of targeted RNA 
recombination is that it lends itself well to studies involving domain 
exchange between different proteins (Peng et al., 1995b) or the ex¬ 
change of genomic elements (Hsue and Masters, 1997). In these cases, 
the system, through its own selection of allowable crossover sites, can 
reveal which substitutions retain functionality and which are lethal. 
On the other hand, full-length cDNA reverse-genetic strategies provide 
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the capacity to site-specifically mutagenize the exceedingly large viral 
RNA replicase gene. This advantage is just beginning to be exploited, 
and it can be expected to play a maj or role in the future in the acquisition 
of an understanding of the workings of the complex RNA synthesis 
machinery. In addition to molecular biological studies, coronavirus 
reverse-genetic investigations have opened the door to the develop¬ 
ment of these viruses, and their derivative replicons, for vaccines 
(Alonso et al., 2002; Haijema et al., 2004), expression systems (de Haan 
et al., 2003b, 2005), and gene delivery vectors (Thiel et al., 2001b, 
2003b). 
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